Output Schema¶
Harmonization table¶
The harmonization table (harmonization_table.tsv) contains one row per original feature with all mapping metadata.
Column |
Description |
Always present |
|---|---|---|
|
Raw feature name from input |
Yes |
|
Raw feature ID from input (Ensembl ID if available) |
No |
|
Classified feature type ( |
Yes |
|
Ensembl ID with version suffix stripped |
No |
|
Species identifier |
Yes |
|
Dataset name |
Yes |
|
Inferred annotation source of input (e.g., Ensembl/GENCODE) |
No |
|
Timestamp of reference data used for harmonization |
Yes |
|
Canonical Ensembl gene ID (or source_id fallback) |
Mapped only |
|
Official approved symbol |
Mapped only |
|
Which tier resolved the mapping |
Yes |
|
|
Yes |
|
Lookup that resolved this feature |
Mapped only |
|
Warnings, candidates, version mismatches |
When applicable |
|
Version of stangene that produced this mapping |
Yes |
Mapping status values¶
Status |
Tier |
Meaning |
|---|---|---|
|
1 |
Ensembl gene ID matched exactly |
|
2 |
Ensembl ID matched after stripping version suffix |
|
3 |
Official approved gene symbol matched |
|
4 |
Matched via an alias (alternative) name |
|
4 |
Matched via a previous (old) name |
|
- |
Multiple candidate genes; not resolved |
|
5 |
No confident match found |
|
- |
Not a gene; excluded from matching |
Confidence levels¶
Level |
When assigned |
|---|---|
|
Tier 1, 2, or 3 match (exact ID or approved symbol) |
|
Tier 4 match (alias/previous symbol), or Tier 3 match to a withdrawn gene |
|
Ambiguous — multiple candidates found |
null |
Unmapped or non-gene feature |
Summary JSON¶
summary.json contains:
{
"total_features": 32738,
"gene_features": 32738,
"non_gene_features": 0,
"status_counts": {
"exact_id": 24260,
"unmapped": 7859,
"exact_symbol": 411,
"previous_symbol": 172,
"alias_symbol": 33,
"ambiguous": 3
},
"duplicate_harmonized_ids": 165,
"duplicate_harmonized_symbols": 165
}
Conflict report¶
conflicts.tsv lists issues that may require manual review:
Conflict type |
Description |
|---|---|
|
Multiple original features map to the same canonical gene |
|
Feature could not be matched |
|
Feature resolved via a previous symbol (gene was renamed) |
|
Feature matched multiple candidate genes |
Markdown report¶
report.md is a human-readable summary containing all of the above in formatted tables, plus warnings about Excel-corrupted gene names and detailed conflict listings.