-
Notifications
You must be signed in to change notification settings - Fork 22
feat: Return chromsizes tileset info #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
cfd7968
feat: Return chromsizes tilest info
pkerpedjiev 1e31b7a
Updated the CHANGELOG
pkerpedjiev 55e2327
Fix linting and add smart_open dependency
pkerpedjiev 2276c65
Add docstring to tileset_info function
pkerpedjiev 624fd7f
Test using file-like object
pkerpedjiev 3133dde
Load file as binary
pkerpedjiev 249de68
Removed TODO line
pkerpedjiev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| from typing import List, Optional | ||
|
|
||
| from pydantic import BaseModel | ||
|
|
||
|
|
||
| class TilesetInfo(BaseModel): | ||
| max_width: int | ||
| min_pos: List[int] | ||
| max_pos: List[int] | ||
| chromsizes: Optional[List] | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,35 +1,61 @@ | ||
| import csv | ||
| import logging | ||
| from smart_open import open | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def get_tsv_chromsizes(filename): | ||
| def tileset_info(filename: str) -> dict: | ||
| """Return a standard higlass tileset info object that contains | ||
| chromsizes as an element. | ||
|
|
||
| The chromsizes in the returned object will be a list of [name, size] | ||
| tuples. | ||
|
|
||
| [ | ||
| ['chr1', 1000], | ||
| ['chr2', 2000] | ||
| ] | ||
| """ | ||
| chromsizes = get_tsv_chromsizes(filename) | ||
|
|
||
| max_width = sum([int(c[1]) for c in chromsizes]) | ||
| return { | ||
| "max_width": max_width, | ||
| "chromsizes": [[c[0], int(c[1])] for c in chromsizes], | ||
| "min_pos": [0], | ||
| "max_pos": [max_width], | ||
| } | ||
|
|
||
|
|
||
| def get_tsv_chromsizes(file): | ||
| """ | ||
| Get a list of chromosome sizes from this [presumably] tsv | ||
| chromsizes file file. | ||
| chromsizes file. | ||
|
|
||
| Parameters: | ||
| ----------- | ||
| filename: string | ||
| The filename of the tsv file | ||
| file: string or file-like object | ||
| A file-like object | ||
|
|
||
| Returns | ||
| ------- | ||
| chromsizes: [(name:string, size:int), ...] | ||
| An ordered list of chromosome names and sizes | ||
| """ | ||
| if isinstance(file, str): | ||
| file = open(file, "rb") | ||
|
|
||
| try: | ||
| with open(filename, "r") as f: | ||
| reader = csv.reader(f, delimiter="\t") | ||
| file.seek(0) | ||
| binary_data = file.read() | ||
| text_data = binary_data.decode("utf-8") | ||
|
|
||
| data = [] | ||
| for row in reader: | ||
| data.append(row) | ||
| lines = text_data.split("\n") | ||
| data = [line.strip().split("\t") for line in lines if line.strip()] | ||
| return data | ||
| except Exception as ex: | ||
| logger.error(ex) | ||
|
|
||
| err_msg = "WHAT?! Could not load file %s. 😤 (%s)" % (filename, ex) | ||
| err_msg = "WHAT?! Could not load file %s." % (ex) | ||
|
|
||
| raise Exception(err_msg) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| import os.path as op | ||
|
|
||
| import clodius.tiles.chromsizes as ctcs | ||
| from clodius.models.tileset_info import TilesetInfo | ||
|
|
||
|
|
||
| def test_get_tileset_info(): | ||
| filename = op.join("data", "chromSizes.tsv") | ||
|
|
||
| # Test loading tileset info using a filename | ||
| tsinfo = TilesetInfo(**ctcs.tileset_info(filename)) | ||
|
|
||
| assert tsinfo.max_width > 100 | ||
| assert len(tsinfo.chromsizes) > 2 | ||
|
|
||
| with open(filename, "rb") as f: | ||
| # Test loading using a file-like object | ||
| tsinfo = TilesetInfo(**ctcs.tileset_info(f)) | ||
|
|
||
| assert tsinfo.max_width > 100 | ||
| assert len(tsinfo.chromsizes) > 2 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is only used in the test. Should that be in the test module only or is this intended as a part of the public API?