Sijie Chen

Stanford, CA

I build machine-learning systems for biological discovery. My work centers on representation learning, generative latent-variable modeling, and cross-domain alignment — drawing on discrete optimal transport, variational inference, and geometric / equivariant deep learning to turn high-dimensional, irregularly-structured biological signals into representations that modern architectures can exploit.

Recent directions:

TransMap (NeurIPS 2026 submission): reorganises sparse gene-expression vectors as 2D feature images via Gromov–Wasserstein optimal transport, so CNN encoders can read off gene–gene structure as spatial locality. A shared multi-species grid further aligns cells across organisms without ortholog conversion.
Dynode: SE(3)-equivariant transformer-based neural ODEs that learn 3D organogenesis trajectories from spatiotemporally resolved single-cell data, supporting in silico perturbation for congenital heart disease.
scMulan: contributing author on a 368M-parameter multitask generative pre-trained foundation model for single-cell analysis.

Previously, I completed my PhD at Tsinghua University, Department of Automation, advised by Prof. Xuegong Zhang and Prof. Michael S. Waterman. My doctoral work spanned cell-atlas assembly (hECA), velocity-informed cross-batch integration, and information-theoretic statistics for repeat detection in genomes ($D_2^R$).

I am open to industry research / engineering positions in machine learning, foundation models for science, and AI for medicine. CV (PDF) · GitHub · Google Scholar · LinkedIn

selected publications

NeurIPS

TransMap: Image-Native Representations for Single-Cell Genomics

Sijie Chen and others

NeurIPS (under review), 2026

Abs

Single-cell RNA-seq profiles are sparse, high-dimensional gene-expression vectors whose gene order carries no biological geometry. We introduce TransMap, a representation framework that converts each cell into a 2D transcriptomic image by arranging genes on a fixed grid via Gromov–Wasserstein optimal transport, so that nearby pixels correspond to genes with similar relational profiles. On five scRNA-seq benchmarks, CNN encoders on TransMap images match or outperform parameter-matched MLPs on scIB embedding metrics. A shared pixel grid further aligns cell states on a human–mouse pancreas benchmark without explicit ortholog conversion.
iScience

hECA: The Cell-Centric Assembly of a Cell Atlas

Sijie* Chen, Yanting* Luo, Haoxiang* Gao, and 8 more authors

iScience, 2022

HTML
Bioinformatics

A New Statistic for Efficient Detection of Repetitive Sequences

Sijie Chen, Yixin Chen, Fengzhu Sun, and 2 more authors

Bioinformatics, 2019

HTML
RECOMB/ISMB

scMulan: A Multitask Generative Pre-trained Language Model for Single-Cell Analysis

Haiyang Bian, Yixin Chen, Xiaomin Dong, and 7 more authors

In RECOMB / ISMB, 2024