feat: allow merge argument for DatasetCollection.add_adatas#198
feat: allow merge argument for DatasetCollection.add_adatas#198ilan-gold wants to merge 8 commits into
merge argument for DatasetCollection.add_adatas#198Conversation
| if var_subset is not None: | ||
| adata = adata[:, adata.var.index.isin(var_subset)] |
There was a problem hiding this comment.
I also slipped this in - I don't think there's much point in outer joining genes if we're just gonna subset them anyway. It seems like we should just subset first and then outer join. I think this is just a distributive law of sets.
There was a problem hiding this comment.
See #198 (comment) for a bit more info on a small implication of this, which I think is actually more "correct" anyway
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #198 +/- ##
==========================================
- Coverage 93.48% 91.68% -1.81%
==========================================
Files 14 14
Lines 1121 1118 -3
==========================================
- Hits 1048 1025 -23
- Misses 73 93 +20
🚀 New features to boost your workflow:
|
merge argument for add_adatasmerge argument for DatasetCollection.add_adatas
…oaders into ig/subset_before_concat
|
Ok so here's an interesting problem: by subsetting the gene space first and then concatenating with an outer join, for # /// script
# requires-python = ">=3.12"
# dependencies = [
# "anndata",
# ]
# ///
from __future__ import annotations
import numpy as np
import pandas as pd
import anndata as ad
adatas = [
ad.AnnData(
X=np.ones((1, 2)),
obs=pd.DataFrame({"1": ["a"], "2": ["b"]}),
var=pd.DataFrame(index=["1", "2"], data={"column_1": [1, 2]}),
),
ad.AnnData(
X=np.ones((1, 2)),
obs=pd.DataFrame({"1": ["a"], "3": ["b"]}),
var=pd.DataFrame(index=["1", "3"], data={"column_1": [1, 2]}),
),
]
print(ad.concat(adatas, join="outer", merge="unique")[:, ["1"]].var.columns) # empty
print(
ad.concat(
[adata[:, ["1"]] for adata in adatas], join="outer", merge="unique"
).var.columns
) # has column_1So the question is whether or not we have this. It's a subtle behavioral difference from |
To do this we need to disallow
Dataset2Dinanndata.conatbecause it does not respectmergearguments...Luckily, these df's are generally small.It might be time to start thinking about a settings object.