Table of Contents
Fetching ...

Adversarial Batch Representation Augmentation for Batch Correction in High-Content Cellular Screening

Lei Tong, Xujing Yao, Adam Corrigan, Long Chen, Navin Rathna Kumar, Kerry Hallbrook, Jonathan Orme, Yinhai Wang, Huiyu Zhou

TL;DR

This work frames bio-batch mitigation as a Domain Generalization (DG) problem and proposes Adversarial Batch Representation Augmentation (ABRA), a new state-of-the-art for siRNA perturbation classification that actively synthesizes worst-case bio-batch perturbations in the representation space.

Abstract

High-Content Screening routinely generates massive volumes of cell painting images for phenotypic profiling. However, technical variations across experimental executions inevitably induce biological batch (bio-batch) effects. These cause covariate shifts and degrade the generalization of deep learning models on unseen data. Existing batch correction methods typically rely on additional prior knowledge (e.g., treatment or cell culture information) or struggle to generalize to unseen bio-batches. In this work, we frame bio-batch mitigation as a Domain Generalization (DG) problem and propose Adversarial Batch Representation Augmentation (ABRA). ABRA explicitly models batch-wise statistical fluctuations by parameterizing feature statistics as structured uncertainties. Through a min-max optimization framework, it actively synthesizes worst-case bio-batch perturbations in the representation space, guided by a strict angular geometric margin to preserve fine-grained class discriminability. To prevent representation collapse during this adversarial exploration, we introduce a synergistic distribution alignment objective. Extensive evaluations on the large-scale RxRx1 and RxRx1-WILDS benchmarks demonstrate that ABRA establishes a new state-of-the-art for siRNA perturbation classification.

Adversarial Batch Representation Augmentation for Batch Correction in High-Content Cellular Screening

TL;DR

This work frames bio-batch mitigation as a Domain Generalization (DG) problem and proposes Adversarial Batch Representation Augmentation (ABRA), a new state-of-the-art for siRNA perturbation classification that actively synthesizes worst-case bio-batch perturbations in the representation space.

Abstract

High-Content Screening routinely generates massive volumes of cell painting images for phenotypic profiling. However, technical variations across experimental executions inevitably induce biological batch (bio-batch) effects. These cause covariate shifts and degrade the generalization of deep learning models on unseen data. Existing batch correction methods typically rely on additional prior knowledge (e.g., treatment or cell culture information) or struggle to generalize to unseen bio-batches. In this work, we frame bio-batch mitigation as a Domain Generalization (DG) problem and propose Adversarial Batch Representation Augmentation (ABRA). ABRA explicitly models batch-wise statistical fluctuations by parameterizing feature statistics as structured uncertainties. Through a min-max optimization framework, it actively synthesizes worst-case bio-batch perturbations in the representation space, guided by a strict angular geometric margin to preserve fine-grained class discriminability. To prevent representation collapse during this adversarial exploration, we introduce a synergistic distribution alignment objective. Extensive evaluations on the large-scale RxRx1 and RxRx1-WILDS benchmarks demonstrate that ABRA establishes a new state-of-the-art for siRNA perturbation classification.
Paper Structure (14 sections, 14 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 14 sections, 14 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Motivational comparison of AdaBN, AdvStyle and ABRA. (a) AdaBN standardizes features by mapping them to historical bio-batch statistics (i.e., mean and standard deviation). (b) AdvStyle introduces instance-wise perturbations into the representation space, utilizing adversarial signals driven solely by classification likelihood. (c) ABRA explicitly perturbs the bio-batch representation space while jointly enforcing the classification objective and an angular geometry constraint to ensure robust and discriminative feature learning.
  • Figure 2: Framework of Adversarial Batch Representation Augmentation (ABRA). Given a bio-batch of cell painting images, a CNN encoder produces the corresponding representation $\mathcal{X}$. The ABRA module computes batch channel-wise statistics $\mu_{c}(\mathcal{X})$ and $\sigma_{c}(\mathcal{X})$, and models uncertainty in the statistics space using learnable parameters $\{\mathcal{K}_{\mu}, \mathcal{K}_{\sigma}\}$. Gaussian reparameterization instantiates the perturbations to transform the clean representation $\mathcal{X}$ into the perturbed representation $\mathcal{X}_{t}$. A hybrid objective optimizes the framework: the cross-entropy loss $\mathcal{L}_{CE}$ promotes inter-class separability, and the ArcFace loss $\mathcal{L}_{arc}$ enforces intra-class compactness and inter-class separation through an additive angular margin. The stability terms $\mathcal{R}_{JS}$ further align the discriminative distributions of clean and perturbed representations to reduce representation collapse. Optimization alternates between two phases: (1) maximize $\mathcal{L}_{adv}$ with respect to $\mathcal{K}$ while freezing $\theta$; (2) minimize $\mathcal{L}_{rob}$ with respect to $\theta$ to obtain a robust model.
  • Figure 3: Representative images from the RxRx1 dataset, depicting phenotypic variations in HEPG2 and HUVEC cell lines resulting from three siRNA perturbations across 5 experimental batches. Each perturbation elicits distinct changes in cell morphology, count, and spatial distribution.
  • Figure 4: Sample images from the RxRx1-wilds dataset, showcasing the phenotypic diversity elicited by siRNA perturbations in HEPG2 and HUVEC cell lines across five experimental batches.
  • Figure 5: Investigating the optimal location within the network architecture to apply ABRA. Designations such as 'res34' indicate that ABRA is inserted after both the 3rd and 4th residual blocks of the ResNet-50 backbone.
  • ...and 5 more figures