Table of Contents
Fetching ...

Mask-Guided Attention Regulation for Anatomically Consistent Counterfactual CXR Synthesis

Zichun Zhang, Weizhi Nie, Honglin Guo, Yuting Su

TL;DR

An inference-time attention regulation framework for reliable counterfactual CXR synthesis is presented, showing improved anatomical consistency and more precise, controllable pathological edits compared with standard diffusion editing, supporting localized counterfactual analysis and data augmentation for downstream tasks.

Abstract

Counterfactual generation for chest X-rays (CXR) aims to simulate plausible pathological changes while preserving patient-specific anatomy. However, diffusion-based editing methods often suffer from structural drift, where stable anatomical semantics propagate globally through attention and distort non-target regions, and unstable pathology expression, since subtle and localized lesions induce weak and noisy conditioning signals. We present an inference-time attention regulation framework for reliable counterfactual CXR synthesis. An anatomy-aware attention regularization module gates self-attention and anatomy-token cross-attention with organ masks, confining structural interactions to anatomical ROIs and reducing unintended distortions. A pathology-guided module enhances pathology-token cross-attention within target lung regions during early denoising and performs lightweight latent corrections driven by an attention-concentration energy, enabling controllable lesion localization and extent. Extensive evaluations on CXR datasets show improved anatomical consistency and more precise, controllable pathological edits compared with standard diffusion editing, supporting localized counterfactual analysis and data augmentation for downstream tasks.

Mask-Guided Attention Regulation for Anatomically Consistent Counterfactual CXR Synthesis

TL;DR

An inference-time attention regulation framework for reliable counterfactual CXR synthesis is presented, showing improved anatomical consistency and more precise, controllable pathological edits compared with standard diffusion editing, supporting localized counterfactual analysis and data augmentation for downstream tasks.

Abstract

Counterfactual generation for chest X-rays (CXR) aims to simulate plausible pathological changes while preserving patient-specific anatomy. However, diffusion-based editing methods often suffer from structural drift, where stable anatomical semantics propagate globally through attention and distort non-target regions, and unstable pathology expression, since subtle and localized lesions induce weak and noisy conditioning signals. We present an inference-time attention regulation framework for reliable counterfactual CXR synthesis. An anatomy-aware attention regularization module gates self-attention and anatomy-token cross-attention with organ masks, confining structural interactions to anatomical ROIs and reducing unintended distortions. A pathology-guided module enhances pathology-token cross-attention within target lung regions during early denoising and performs lightweight latent corrections driven by an attention-concentration energy, enabling controllable lesion localization and extent. Extensive evaluations on CXR datasets show improved anatomical consistency and more precise, controllable pathological edits compared with standard diffusion editing, supporting localized counterfactual analysis and data augmentation for downstream tasks.
Paper Structure (10 sections, 9 equations, 2 figures, 2 tables)

This paper contains 10 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overview of our inference-time attention regulation framework for counterfactual CXR generation. (a) The input image is encoded by VAEVAE, noised, and denoised by a conditional diffusion model to generate a counterfactual image. (b) Anatomy-aware self-attention gating with $M_{\text{anat}}$ preserves structural consistency, while (c) pathology-guided cross-attention reweighting with a mask-derived prior $\Omega$ localizes pathological edits.
  • Figure 2: Qualitative comparison with state-of-the-art counterfactual CXR generation methods. Given the target prompt and the original image, we compare our results with SD-inpainting, PIE, BiomedJourney, and ProgEmu on representative cases.