A Hybrid Computational Intelligence Framework for scRNA-seq Imputation: Integrating scRecover and Random Forests
Ali Anaissi, Deshao Liu, Yuanzhe Jia, Weidong Huang, Widad Alyassine, Junaid Akram
TL;DR
This work tackles dropout in scRNA-seq by introducing SCR-MF, a modular two-stage framework that first detects technical zeros with scRecover and then imputes only those entries using missForest. By explicitly separating dropout identification from value recovery, SCR-MF preserves true biological zeros while capturing nonlinear gene dependencies, yielding robust and interpretable improvements in clustering quality across public and simulated datasets. The authors provide a comprehensive evaluation using ARI and NMI, explore practical considerations for parameter choices and stratified modeling, and discuss runtime tradeoffs with scalable alternatives. Collectively, SCR-MF offers a practical, scalable approach for improving downstream analyses in mid-scale single-cell datasets, with avenues for further integration with deep models and uncertainty quantification.
Abstract
Single-cell RNA sequencing (scRNA-seq) enables transcriptomic profiling at cellular resolution but suffers from pervasive dropout events that obscure biological signals. We present SCR-MF, a modular two-stage workflow that combines principled dropout detection using scRecover with robust non-parametric imputation via missForest. Across public and simulated datasets, SCR-MF achieves robust and interpretable performance comparable to or exceeding existing imputation methods in most cases, while preserving biological fidelity and transparency. Runtime analysis demonstrates that SCR-MF provides a competitive balance between accuracy and computational efficiency, making it suitable for mid-scale single-cell datasets.
