AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images

Yihang Liu; Lianghua He; Ying Wen; Longzhen Yang; Hongzhou Chen

AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images

Yihang Liu, Lianghua He, Ying Wen, Longzhen Yang, Hongzhou Chen

TL;DR

AFiRe tackles the gap in radiographic SSL by integrating anatomy-aware token-level contrastive learning with pixel-level anomaly restoration, guided by synthetic lesion augmentation. By aligning ViT token distributions with spatially-aware anatomical prototypes and selectively restoring abnormal tokens, it achieves cohesive fine-grained representations and strong generalization under limited labeling. The approach demonstrates superior performance on multi-label classification and anomaly detection across chest X-ray datasets, with qualitative localization insights from Grad-CAM and robust ablations validating each component. This anatomy-driven framework has practical impact for data-efficient radiographic analysis and precise lesion localization using only image-level annotations during training.

Abstract

Current self-supervised methods, such as contrastive learning, predominantly focus on global discrimination, neglecting the critical fine-grained anatomical details required for accurate radiographic analysis. To address this challenge, we propose an Anatomy-driven self-supervised framework for enhancing Fine-grained Representation in radiographic image analysis (AFiRe). The core idea of AFiRe is to align the anatomical consistency with the unique token-processing characteristics of Vision Transformer. Specifically, AFiRe synergistically performs two self-supervised schemes: (i) Token-wise anatomy-guided contrastive learning, which aligns image tokens based on structural and categorical consistency, thereby enhancing fine-grained spatial-anatomical discrimination; (ii) Pixel-level anomaly-removal restoration, which particularly focuses on local anomalies, thereby refining the learned discrimination with detailed geometrical information. Additionally, we propose Synthetic Lesion Mask to enhance anatomical diversity while preserving intra-consistency, which is typically corrupted by traditional data augmentations, such as Cropping and Affine transformations. Experimental results show that AFiRe: (i) provides robust anatomical discrimination, achieving more cohesive feature clusters compared to state-of-the-art contrastive learning methods; (ii) demonstrates superior generalization, surpassing 7 radiography-specific self-supervised methods in multi-label classification tasks with limited labeling; and (iii) integrates fine-grained information, enabling precise anomaly detection using only image-level annotations.

AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images

TL;DR

Abstract

AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)