Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

Xinrui Zhou; Yuhao Huang; Haoran Dou; Shijing Chen; Ao Chang; Jia Liu; Weiran Long; Jian Zheng; Erjiao Xu; Jie Ren; Ruobing Huang; Jun Cheng; Wufeng Xue; Dong Ni

Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

Xinrui Zhou, Yuhao Huang, Haoran Dou, Shijing Chen, Ao Chang, Jia Liu, Weiran Long, Jian Zheng, Erjiao Xu, Jie Ren, Ruobing Huang, Jun Cheng, Wufeng Xue, Dong Ni

TL;DR

The paper tackles data scarcity, class imbalance, and out-domain generalization in medical sequence classification. It introduces Ctrl-GenAug, a diffusion-based framework that jointly provides semantic and sequential control for generating high-quality medical sequences, followed by a noisy data filter to ensure reliability. The approach uses a multimodal-conditioned sequence generator, a sequential augmentation module, and a data-quality gate, validated across three datasets and multiple classifiers, with notable gains in underrepresented and out-domain settings. This work improves robustness and practicality of synthetic data in clinical sequencing tasks and outlines directions for faster sampling and broader applications.

Abstract

In the medical field, the limited availability of large-scale datasets and labor-intensive annotation processes hinder the performance of deep models. Diffusion-based generative augmentation approaches present a promising solution to this issue, having been proven effective in advancing downstream medical recognition tasks. Nevertheless, existing works lack sufficient semantic and sequential steerability for challenging video/3D sequence generation, and neglect quality control of noisy synthesized samples, resulting in unreliable synthetic databases and severely limiting the performance of downstream tasks. In this work, we present Ctrl-GenAug, a novel and general generative augmentation framework that enables highly semantic- and sequential-customized sequence synthesis and suppresses incorrectly synthesized samples, to aid medical sequence classification. Specifically, we first design a multimodal conditions-guided sequence generator for controllably synthesizing diagnosis-promotive samples. A sequential augmentation module is integrated to enhance the temporal/stereoscopic coherence of generated samples. Then, we propose a noisy synthetic data filter to suppress unreliable cases at semantic and sequential levels. Extensive experiments on 3 medical datasets, using 11 networks trained on 3 paradigms, comprehensively analyze the effectiveness and generality of Ctrl-GenAug, particularly in underrepresented high-risk populations and out-domain conditions.

Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

TL;DR

Abstract

Paper Structure (30 sections, 5 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 5 equations, 9 figures, 7 tables, 1 algorithm.

Introduction
Related Works
Controllable Video Synthesis with Diffusion Models
Generative Augmentation with Diffusion Models
Method
Model Overview
Basic Architecture of Sequence Generator
Preliminaries of Video DDPMs
Factorized Learning of Visual Features and Sequential Patterns
2D-to-3D Model Inflation
Multimodal Conditions Guidance
Semantic Conditions for Visual Appearance Control
Sequential Condition for Serial Guidance
Sequential Augmentation Module
Key-frame/slice Attention
...and 15 more sections

Figures (9)

Figure 1: Datasets description: (a) Carotid US videos with various stenosis gradings. Red and yellow boxes represent plaques and residual lumens, respectively. (b) Thyroid US videos with different TI-RADS levels, where blue boxes indicate nodules. (c) Cardiac MRI volumes with distinct diseases. Three key anatomical structures associated with diagnosis are highlighted with masks, including the left ventricle (blue), myocardium (green), and right ventricle (red).
Figure 2: t-SNE visualization of features of synthetic and real training data by a pre-trained I3D carreira2017quo model on our three datasets.
Figure 3: Pipeline of using our proposed framework to facilitate medical sequence recognition, which can be worked with a variety of classifiers. Here, we use the carotid plaque US video sequence as an example to demonstrate the overall process.
Figure 4: Pipeline of our proposed sequence generator. The MedSAM image encoder huang2024segment is used for domain-specific image prior feature extraction.
Figure 5: Schematics of three attention mechanisms for sequential modeling (a-c) and our proposed sequential augmentation module in this study (d).
...and 4 more figures

Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

TL;DR

Abstract

Ctrl-GenAug: Controllable Generative Augmentation for Medical Sequence Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (9)