Sijie Chen
Postdoctoral scholar, Stanford University. Machine learning for biological discovery.
Stanford, CA
I build machine-learning systems for biological discovery. My work centers on representation learning, generative latent-variable modeling, and cross-domain alignment — drawing on discrete optimal transport, variational inference, and geometric / equivariant deep learning to turn high-dimensional, irregularly-structured biological signals into representations that modern architectures can exploit.
Recent directions:
- TransMap (NeurIPS 2026 submission): reorganises sparse gene-expression vectors as 2D feature images via Gromov–Wasserstein optimal transport, so CNN encoders can read off gene–gene structure as spatial locality. A shared multi-species grid further aligns cells across organisms without ortholog conversion.
- Dynode: SE(3)-equivariant transformer-based neural ODEs that learn 3D organogenesis trajectories from spatiotemporally resolved single-cell data, supporting in silico perturbation for congenital heart disease.
- scMulan: contributing author on a 368M-parameter multitask generative pre-trained foundation model for single-cell analysis.
Previously, I completed my PhD at Tsinghua University, Department of Automation, advised by Prof. Xuegong Zhang and Prof. Michael S. Waterman. My doctoral work spanned cell-atlas assembly (hECA), velocity-informed cross-batch integration, and information-theoretic statistics for repeat detection in genomes ($D_2^R$).
I am open to industry research / engineering positions in machine learning, foundation models for science, and AI for medicine. CV (PDF) · GitHub · Google Scholar · LinkedIn
selected publications
- NeurIPS
- iScience
- Bioinformatics
- RECOMB/ISMB
scMulan: A Multitask Generative Pre-trained Language Model for Single-Cell AnalysisIn RECOMB / ISMB, 2024