Table of Contents
Fetching ...

You Only Train Once: Differentiable Subset Selection for Omics Data

Daphné Chopard, Jorge da Silva Gonçalves, Irene Cannistraci, Thomas M. Sutter, Julia E. Vogt

TL;DR

YOTO introduces an end-to-end differentiable framework for selecting discrete gene subsets in single-cell omics, coupling subset selection and prediction through a closed feedback loop. It uses a differentiable ranking mechanism (Gumbel-Softmax and Plackett-Luce) to produce a sparse top-k gene mask, guided by multi-task prediction within a shared encoder. Across COVID-PBMC and VISp datasets, YOTO achieves competitive or superior performance with fewer training steps, and ablations confirm the critical role of its sparse selection module. The approach offers robust, interpretable gene panels suitable for biomarker discovery and is extendable to other high-dimensional omics domains and targeted profiling settings.

Abstract

Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.

You Only Train Once: Differentiable Subset Selection for Omics Data

TL;DR

YOTO introduces an end-to-end differentiable framework for selecting discrete gene subsets in single-cell omics, coupling subset selection and prediction through a closed feedback loop. It uses a differentiable ranking mechanism (Gumbel-Softmax and Plackett-Luce) to produce a sparse top-k gene mask, guided by multi-task prediction within a shared encoder. Across COVID-PBMC and VISp datasets, YOTO achieves competitive or superior performance with fewer training steps, and ablations confirm the critical role of its sparse selection module. The approach offers robust, interpretable gene panels suitable for biomarker discovery and is extendable to other high-dimensional omics domains and targeted profiling settings.

Abstract

Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.

Paper Structure

This paper contains 33 sections, 6 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Overview of YOTO. Our method, YOTO, consists of three main building blocks: (1) a subset selection block (\ref{['sec:method_subset_selection']}), (2) a shared encoder for all tasks and (3) a multi-task learning block (\ref{['sec:method_mtl']}). The subset selection blocks a binary mask such that we can select the $k$ most informative genes for a given set of tasks. The shared encoder encodes the $k$ gene values into a latent representation, which is fed to the task-specific heads.
  • Figure 2: Downstream F1-score across different number of selected genes. F1-scores for the cell_types_25 task on the VISp dataset using gene subsets of sizes $k \in \{16,32,64,128,256\}$. The Seurat and mRMR-f baselines are omitted due to consistently low performance.
  • Figure 3: Comprehensive evaluation using multiple classification metrics. F1-score, Accuracy, AUROC, AUPRC) for the celltype5 task on the COVID-PBMC dataset using a gene subset of size $k=64$.
  • Figure 4: Label distributions for all tasks in the VISp and COVID-PBMC datasets. In the VISp panels, the dashed vertical line, shown only in the VISp distributions, denotes the cumulative threshold dividing the first 90% of observations from the remaining tail.