Gene Incremental Learning for Single-Cell Transcriptomics
Jiaxin Qi, Yan Cui, Jianqiang Huang, Gaogang Xie
TL;DR
Gene Incremental Learning (GIL) introduces token-like learning for genes in single-cell transcriptomics, addressing the growth of gene sets by defining base genes and stage-specific gene partitions. The approach adapts Class Incremental Learning (CIL) ideas with a dedicated GIL objective $\mathcal{L}_{\text{GIL},s_k}$ and evaluation protocols, including gene-wise regression and gene-based classification, to quantify forgetting and knowledge transfer. Through baselines (baseline, replay) and knowledge-preserving strategies (distillation), the authors demonstrate that forgetting occurs in the vanilla setting and that both replay and distillation mitigate it, though with trade-offs on downstream classification. The work provides a scalable benchmark on CELLxGENE with six downstream tasks, validating the framework and suggesting future extensions to other token-learning domains in biology and beyond.
Abstract
Classes, as fundamental elements of Computer Vision, have been extensively studied within incremental learning frameworks. In contrast, tokens, which play essential roles in many research fields, exhibit similar characteristics of growth, yet investigations into their incremental learning remain significantly scarce. This research gap primarily stems from the holistic nature of tokens in language, which imposes significant challenges on the design of incremental learning frameworks for them. To overcome this obstacle, in this work, we turn to a type of token, gene, for a large-scale biological dataset--single-cell transcriptomics--to formulate a pipeline for gene incremental learning and establish corresponding evaluations. We found that the forgetting problem also exists in gene incremental learning, thus we adapted existing class incremental learning methods to mitigate the forgetting of genes. Through extensive experiments, we demonstrated the soundness of our framework design and evaluations, as well as the effectiveness of our method adaptations. Finally, we provide a complete benchmark for gene incremental learning in single-cell transcriptomics.
