Input Formats¶
stangene supports two input formats. In both cases, only feature metadata is extracted — the expression matrix is never loaded into memory.
h5ad (AnnData)¶
The primary format. stangene reads adata.var and adata.var_names to extract feature metadata.
Recognized adata.var columns:
Column |
Maps to |
|---|---|
|
|
|
|
When writing results, harmonization columns are added to adata.var in a new *_harmonized.h5ad file. The original var_names are never overwritten.
TSV / CSV¶
stangene auto-detects common column names:
Detected column name |
Maps to |
|---|---|
|
|
|
|
|
|
If your columns have different names, pass an explicit column_map:
ft = stangene.load_features(
"features.tsv",
species="human",
column_map={
"my_gene_col": "original_feature_name",
"my_id_col": "original_feature_id",
},
)
File extension determines the delimiter:
.tsv,.txt→ tab-separated.csv→ comma-separated