Sijie Chen

Postdoctoral scholar, Stanford University. Machine learning for biological discovery.

prof_pic.jpg

Stanford, CA

chensj16@stanford.edu

I build machine-learning systems for biological discovery. My work centers on representation learning, generative latent-variable modeling, and cross-domain alignment — drawing on discrete optimal transport, variational inference, and geometric / equivariant deep learning to turn high-dimensional, irregularly-structured biological signals into representations that modern architectures can exploit.

Recent directions:

  • TransMap (NeurIPS 2026 submission): reorganises sparse gene-expression vectors as 2D feature images via Gromov–Wasserstein optimal transport, so CNN encoders can read off gene–gene structure as spatial locality. A shared multi-species grid further aligns cells across organisms without ortholog conversion.
  • Dynode: SE(3)-equivariant transformer-based neural ODEs that learn 3D organogenesis trajectories from spatiotemporally resolved single-cell data, supporting in silico perturbation for congenital heart disease.
  • scMulan: contributing author on a 368M-parameter multitask generative pre-trained foundation model for single-cell analysis.

Previously, I completed my PhD at Tsinghua University, Department of Automation, advised by Prof. Xuegong Zhang and Prof. Michael S. Waterman. My doctoral work spanned cell-atlas assembly (hECA), velocity-informed cross-batch integration, and information-theoretic statistics for repeat detection in genomes ($D_2^R$).

I am open to industry research / engineering positions in machine learning, foundation models for science, and AI for medicine. CV (PDF) · GitHub · Google Scholar · LinkedIn

selected publications

  1. NeurIPS
    transmap.png
    TransMap: Image-Native Representations for Single-Cell Genomics
    Sijie Chen and others
    NeurIPS (under review), 2026
  2. iScience
    heca.png
    hECA: The Cell-Centric Assembly of a Cell Atlas
    Sijie* Chen, Yanting* Luo, Haoxiang* Gao, and 8 more authors
    iScience, 2022
  3. Bioinformatics
    d2r.png
    A New Statistic for Efficient Detection of Repetitive Sequences
    Sijie Chen, Yixin Chen, Fengzhu Sun, and 2 more authors
    Bioinformatics, 2019
  4. RECOMB/ISMB
    scmulan.png
    scMulan: A Multitask Generative Pre-trained Language Model for Single-Cell Analysis
    Haiyang Bian, Yixin Chen, Xiaomin Dong, and 7 more authors
    In RECOMB / ISMB, 2024