Unsupervised Evolutionary Cell Type Matching via Entropy-Minimized Optimal Transport
Mu Qiao
TL;DR
This work tackles the problem of identifying evolutionary correspondences between cell types across species in an unsupervised manner. The authors introduce OT-MESH, which combines entropy-regularized optimal transport with the Minimize Entropy of Sinkhorn (MESH) refinement to produce sparse, interpretable cross-species cell-type mappings. Using gene-centroid representations built from SNR-selected features and a cosine-based cost between species, OT-MESH achieves near-constrained accuracy with substantial computational speed, outperforming or matching baselines across synthetic scalability tests and retinal BC/RGC datasets from mouse and macaque. The method demonstrates strong robustness to noise and uncovers both known and novel cross-species homologies, including experimentally validated predictions, highlighting its practical impact for large-scale comparative genomics and evolutionary cell biology.
Abstract
Identifying evolutionary correspondences between cell types across species is a fundamental challenge in comparative genomics and evolutionary biology. Existing approaches often rely on either reference-based matching, which imposes asymmetry by designating one species as the reference, or projection-based matching, which may increase computational complexity and obscure biological interpretability at the cell-type level. Here, we present OT-MESH, an unsupervised computational framework leveraging entropy-regularized optimal transport (OT) to systematically determine cross-species cell type homologies. Our method uniquely integrates the Minimize Entropy of Sinkhorn (MESH) technique to refine the OT plan, transforming diffuse transport matrices into sparse, interpretable correspondences. Through systematic evaluation on synthetic datasets, we demonstrate that OT-MESH achieves near-optimal matching accuracy with computational efficiency, while maintaining remarkable robustness to noise. Compared to other OT-based methods like RefCM, OT-MESH provides speedup while achieving comparable accuracy. Applied to retinal bipolar cells (BCs) and retinal ganglion cells (RGCs) from mouse and macaque, OT-MESH accurately recovers known evolutionary relationships and uncovers novel correspondences, one of which was independently validated experimentally. Thus, our framework offers a principled, scalable, and interpretable solution for evolutionary cell type mapping, facilitating deeper insights into cellular specialization and conservation across species.
