Preprocessing (sjanpy.pp)
Gene Filtering
- sjanpy.pp.genecraft.filter_human_sc_genes(adata, remove_predicted=True, remove_non_coding=True, remove_antisense=True, remove_ig_var=True, remove_hb=True, remove_metallothionein=True, remove_histone=False, remove_mt_encoded=False, remove_ribo=False, mask_hvg_only=True)[source]
Comprehensive filtering of uninformative genes for human scRNA-seq data.
If mask_hvg_only is True, it requires that sc.pp.highly_variable_genes has already been run on the adata object.
- sjanpy.pp.genecraft.filter_mouse_sc_genes(adata, remove_predicted=True, remove_non_coding=True, remove_antisense=True, remove_ig_var=True, remove_hb=True, remove_metallothionein=True, remove_histone=False, remove_mt_encoded=False, remove_ribo=False, mask_hvg_only=True)[source]
Filtering for Mouse (Mus musculus) scRNA-seq data.
- sjanpy.pp.genecraft.filter_rat_sc_genes(adata, remove_predicted=True, remove_non_coding=True, remove_antisense=True, remove_ig_var=True, remove_hb=True, remove_metallothionein=True, remove_histone=False, remove_mt_encoded=False, remove_ribo=False, mask_hvg_only=True)[source]
Filtering for Rat (Rattus norvegicus) scRNA-seq data.
HVG Selection
Stratified Splitting
Stratified train / val / test splitting for single-cell obs DataFrames.
- sjanpy.pp.split.stratified_split(obs: DataFrame, stratify_col: str, val_ratio: float = 0.05, test_ratio: float = 0.05, seed: int = 42) DataFrame[source]
Two-stage stratified split into train / val / test.
- Parameters:
obs – Cell-level metadata (one row per cell).
stratify_col – Column in obs used for stratification (e.g.
"cell_type").val_ratio – Fraction of total cells for validation and test sets.
test_ratio – Fraction of total cells for validation and test sets.
seed – Random seed for reproducibility.
- Returns:
Two columns:
cell_index(int position) andsplit(one of"train","val","test").- Return type:
pd.DataFrame