Table of Contents
Fetching ...

Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

Kaichen Xu, Yueyang Ding, Suyang Hou, Weiqiang Zhan, Nisang Chen, Jun Wang, Xiaobo Sun

TL;DR

ACSleuth tackles fine-grained anomalous cell detection under multi-sample domain shifts by integrating AC detection, multi-sample domain adaptation, and fine-grained annotation into a single GAN-based workflow. It introduces a memory-augmented GAN for reconstruction-deviation–guided anomaly scoring, a second GAN to learn domain-shift matrices across samples, and a cross-attention–driven clustering stage that yields interpretable anomaly subtypes. Theoretical results establish the existence of a continuous anomaly-score mapping and bound the loss variation across domains, supporting DS robustness. Empirical results across diverse scRNA-seq, scATAC-seq, and cyber-intrusion datasets show ACSleuth outperforms state-of-the-art methods in both detection and subtype annotation, with strong generalization to tabular data beyond single-cell sequencing.

Abstract

Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these methods fall short of distinguishing anomalous cells into pathologically distinct subtypes. In response, we propose ACSleuth, a novel, reconstruction deviation-guided generative framework that integrates the detection, domain adaptation, and fine-grained annotating of anomalous cells into a methodologically cohesive workflow. Notably, we present the first theoretical analysis of using reconstruction deviations output by generative models for anomaly detection in lieu of domain shifts. This analysis informs us to develop a novel and superior maximum mean discrepancy-based anomaly scorer in ACSleuth. Extensive benchmarks over various single-cell data and other types of tabular data demonstrate ACSleuth's superiority over the state-of-the-art methods in identifying and subtyping anomalies in multi-sample and multi-domain contexts. Our code is available at https://github.com/Catchxu/ACsleuth.

Domain Adaptive and Fine-grained Anomaly Detection for Single-cell Sequencing Data and Beyond

TL;DR

ACSleuth tackles fine-grained anomalous cell detection under multi-sample domain shifts by integrating AC detection, multi-sample domain adaptation, and fine-grained annotation into a single GAN-based workflow. It introduces a memory-augmented GAN for reconstruction-deviation–guided anomaly scoring, a second GAN to learn domain-shift matrices across samples, and a cross-attention–driven clustering stage that yields interpretable anomaly subtypes. Theoretical results establish the existence of a continuous anomaly-score mapping and bound the loss variation across domains, supporting DS robustness. Empirical results across diverse scRNA-seq, scATAC-seq, and cyber-intrusion datasets show ACSleuth outperforms state-of-the-art methods in both detection and subtype annotation, with strong generalization to tabular data beyond single-cell sequencing.

Abstract

Fined-grained anomalous cell detection from affected tissues is critical for clinical diagnosis and pathological research. Single-cell sequencing data provide unprecedented opportunities for this task. However, current anomaly detection methods struggle to handle domain shifts prevalent in multi-sample and multi-domain single-cell sequencing data, leading to suboptimal performance. Moreover, these methods fall short of distinguishing anomalous cells into pathologically distinct subtypes. In response, we propose ACSleuth, a novel, reconstruction deviation-guided generative framework that integrates the detection, domain adaptation, and fine-grained annotating of anomalous cells into a methodologically cohesive workflow. Notably, we present the first theoretical analysis of using reconstruction deviations output by generative models for anomaly detection in lieu of domain shifts. This analysis informs us to develop a novel and superior maximum mean discrepancy-based anomaly scorer in ACSleuth. Extensive benchmarks over various single-cell data and other types of tabular data demonstrate ACSleuth's superiority over the state-of-the-art methods in identifying and subtyping anomalies in multi-sample and multi-domain contexts. Our code is available at https://github.com/Catchxu/ACsleuth.
Paper Structure (33 sections, 6 theorems, 54 equations, 2 figures, 6 tables)

This paper contains 33 sections, 6 theorems, 54 equations, 2 figures, 6 tables.

Key Result

Theorem 3.1

Let $m$ and $n$ denote the actual numbers of inliers and anomalies in the target dataset, respectively. Let $\bm{\delta}_i \in \bm{\delta}_m^x \cup \bm{\delta}_n^\xi$ , as defined in equation eq:def. Define $s_i \coloneqq f_s(\bm{\delta}_i)\in \{0,1\}$. The equation eq:def can be rewritten as: where If $s_i = 1$, instance $i$ is annotated as anomalous, or normal otherwise.

Figures (2)

  • Figure 1: Three error types in multi-sample and multi-domain FACD analysis due to domain shift and sample-specific AC types. Domain shift represents non-biological variations caused by technical differences among samples, while content shift represents true biological variations.
  • Figure 2: The workflow of ACSleuth.

Theorems & Definitions (10)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • proof
  • proof
  • proof
  • Lemma A.1: $k$-dimensional Ramanujan's master theorem bradshaw2023operationalamdeberhan2012ramanujan
  • Lemma A.2: Asymptotic convergence of MMD gretton2012kernel
  • Lemma A.3
  • proof