Table of Contents
Fetching ...

Perturb-and-Restore: Simulation-driven Structural Augmentation Framework for Imbalance Chromosomal Anomaly Detection

Yilan Zhang, Hanbiao Chen, Changchun Yang, Yuetan Chu, Siyuan Chen, Jing Wu, Jingdong Hu, Na Li, Junkai Su, Yuxuan Chen, Ao Xu, Xin Gao, Aihua Yin

Abstract

Detecting structural chromosomal abnormalities is crucial for accurate diagnosis and management of genetic disorders. However, collecting sufficient structural abnormality data is extremely challenging and costly in clinical practice, and not all abnormal types can be readily collected. As a result, deep learning approaches face significant performance degradation due to the severe imbalance and scarcity of abnormal chromosome data. To address this challenge, we propose a Perturb-and-Restore (P&R), a simulation-driven structural augmentation framework that effectively alleviates data imbalance in chromosome anomaly detection. The P&R framework comprises two key components: (1) Structure Perturbation and Restoration Simulation, which generates synthetic abnormal chromosomes by perturbing chromosomal banding patterns of normal chromosomes followed by a restoration diffusion network that reconstructs continuous chromosome content and edges, thus eliminating reliance on rare abnormal samples; and (2) Energy-guided Adaptive Sampling, an energy score-based online selection strategy that dynamically prioritizes high-quality synthetic samples by referencing the energy distribution of real samples. To evaluate our method, we construct a comprehensive structural anomaly dataset consisting of over 260,000 chromosome images, including 4,242 abnormal samples spanning 24 categories. Experimental results demonstrate that the P&R framework achieves state-of-the-art (SOTA) performance, surpassing existing methods with an average improvement of 8.92% in sensitivity, 8.89% in precision, and 13.79% in F1-score across all categories.

Perturb-and-Restore: Simulation-driven Structural Augmentation Framework for Imbalance Chromosomal Anomaly Detection

Abstract

Detecting structural chromosomal abnormalities is crucial for accurate diagnosis and management of genetic disorders. However, collecting sufficient structural abnormality data is extremely challenging and costly in clinical practice, and not all abnormal types can be readily collected. As a result, deep learning approaches face significant performance degradation due to the severe imbalance and scarcity of abnormal chromosome data. To address this challenge, we propose a Perturb-and-Restore (P&R), a simulation-driven structural augmentation framework that effectively alleviates data imbalance in chromosome anomaly detection. The P&R framework comprises two key components: (1) Structure Perturbation and Restoration Simulation, which generates synthetic abnormal chromosomes by perturbing chromosomal banding patterns of normal chromosomes followed by a restoration diffusion network that reconstructs continuous chromosome content and edges, thus eliminating reliance on rare abnormal samples; and (2) Energy-guided Adaptive Sampling, an energy score-based online selection strategy that dynamically prioritizes high-quality synthetic samples by referencing the energy distribution of real samples. To evaluate our method, we construct a comprehensive structural anomaly dataset consisting of over 260,000 chromosome images, including 4,242 abnormal samples spanning 24 categories. Experimental results demonstrate that the P&R framework achieves state-of-the-art (SOTA) performance, surpassing existing methods with an average improvement of 8.92% in sensitivity, 8.89% in precision, and 13.79% in F1-score across all categories.

Paper Structure

This paper contains 26 sections, 10 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Thoughts of Perturb-and-Restore (P&R). P&R mitigates the impact of long-tailed chromosomal distributions by simulating diverse structural anomalies from normal chromosomes. Guided by prior knowledge, structural perturbations are introduced to specific regions and refined via diffusion-based restoration. An energy-guided sampling strategy then selects high-quality synthetic anomalies to enhance the training of robust anomaly detectors.
  • Figure 2: The flowchart of the Structure Perturbation and Restoration Simulation (SPRS) module: (a) Chromosome skeletons are extracted and segmented into rectangular patches. (b) Structural perturbation operations are applied to rearranged chromosomes using prior knowledge.(c) The rearranged chromosomes and their original counterparts are paired to train a diffusion-based restoration model that treats rearrangement as image degradation. (d) The trained restoration model refines these synthetic anomalies by restoring realistic edges and content, generating highly realistic abnormal samples.
  • Figure 3: Framework of the proposed Perturb-and-Restore (P&R) method. (a) The SPRS module generates synthetic abnormal chromosomes by simulating structural anomalies based on prior abnormality knowledge. (b) The EAS module performs dynamic online sampling of synthetic data during training. By referencing the energy distributions of real normal and abnormal samples, the model selects high-quality synthetic anomalies to enhance detection performance.
  • Figure 4: Radar chart of chromosomal anomaly detection on 24 categories (sensitivity, precision of abnormality, and F1 score).
  • Figure 5: Performance comparisons under different imbalance ratio levels.
  • ...and 1 more figures