STRAND: Sequence-Conditioned Transport for Single-Cell Perturbations
Boyang Fu, George Dasoulas, Sameer Gabbita, Xiang Lin, Shanghua Gao, Xiaorui Su, Soumya Ghosh, Marinka Zitnik
TL;DR
STRAND addresses the locus-resolution gap in single-cell perturbation prediction by conditioning perturbations on regulatory DNA sequence and modeling the response as a sequence-conditioned transport from control to perturbed states. It combines a DNA-based perturbation module with a control-anchored latent diffusion (I2SB Bridge) and a CLIP-alignment objective within a modular framework that supports OT-based pairing and replacement of DNA/RNA encoders. The approach yields state-of-the-art performance in low-sample and zero-shot settings, enables sequence-resolved in silico perturbation profiling, and recovers functionally relevant regulatory elements such as alternative transcription start sites. By expanding inference coverage to ~$95\%$ of the genome and enabling locus-level perturbation predictions, STRAND has potential to guide functional genomics studies and the design of genome-scale perturbations.
Abstract
Predicting how genetic perturbations change cellular state is a core problem for building controllable models of gene regulation. Perturbations targeting the same gene can produce different transcriptional responses depending on their genomic locus, including different transcription start sites and regulatory elements. Gene-level perturbation models collapse these distinct interventions into the same representation. We introduce STRAND, a generative model that predicts single-cell transcriptional responses by conditioning on regulatory DNA sequence. STRAND represents a perturbation by encoding the sequence at its genomic locus and uses this representation to parameterize a conditional transport process from control to perturbed cell states. Representing perturbations by sequence, rather than by a fixed set of gene identifiers, supports zero-shot inference at loci not seen during training and expands inference-time genomic coverage from ~1.5% for gene-level single-cell foundation models to ~95% of the genome. We evaluate STRAND on CRISPR perturbation datasets in K562, Jurkat, and RPE1 cells. STRAND improves discrimination scores by up to 33% in low-sample regimes, achieves the best average rank on unseen gene perturbation benchmarks, and improves transfer to novel cell lines by up to 0.14 in Pearson correlation. Ablations isolate the gains to sequence conditioning and transport, and case studies show that STRAND resolves functionally alternative transcription start sites missed by gene-level models.
