Table of Contents
Fetching ...

ACE: Anatomically Consistent Embeddings in Composition and Decomposition

Ziyu Zhou, Haozhe Luo, Mohammad Reza Hosseinzadeh Taher, Jiaxuan Pang, Xiaowei Ding, Michael Gotway, Jianming Liang

TL;DR

ACE proposes anatomically aware self-supervised learning for medical images by enforcing global consistency and local composition/decomposition through grid-based patch matching. The two-branch framework learns global macro-structures and fine-grained local tissue details, yielding embeddings that support accurate patch-level retrieval, cross-patient anatomical correspondence, and symmetry, with strong transfer to classification and segmentation tasks across chest X-ray and fundus images. The approach demonstrates data-efficient few-shot performance, competitive fine-tuning results, and generalization to other modalities, highlighting the practical impact of incorporating anatomical priors into SSL for medical imaging. Overall, ACE advances annotation-efficient, anatomically grounded representations that improve robustness, interpretability, and transferability in clinical image analysis.

Abstract

Medical images acquired from standardized protocols show consistent macroscopic or microscopic anatomical structures, and these structures consist of composable/decomposable organs and tissues, but existing self-supervised learning (SSL) methods do not appreciate such composable/decomposable structure attributes inherent to medical images. To overcome this limitation, this paper introduces a novel SSL approach called ACE to learn anatomically consistent embedding via composition and decomposition with two key branches: (1) global consistency, capturing discriminative macro-structures via extracting global features; (2) local consistency, learning fine-grained anatomical details from composable/decomposable patch features via corresponding matrix matching. Experimental results across 6 datasets 2 backbones, evaluated in few-shot learning, fine-tuning, and property analysis, show ACE's superior robustness, transferability, and clinical potential. The innovations of our ACE lie in grid-wise image cropping, leveraging the intrinsic properties of compositionality and decompositionality of medical images, bridging the semantic gap from high-level pathologies to low-level tissue anomalies, and providing a new SSL method for medical imaging.

ACE: Anatomically Consistent Embeddings in Composition and Decomposition

TL;DR

ACE proposes anatomically aware self-supervised learning for medical images by enforcing global consistency and local composition/decomposition through grid-based patch matching. The two-branch framework learns global macro-structures and fine-grained local tissue details, yielding embeddings that support accurate patch-level retrieval, cross-patient anatomical correspondence, and symmetry, with strong transfer to classification and segmentation tasks across chest X-ray and fundus images. The approach demonstrates data-efficient few-shot performance, competitive fine-tuning results, and generalization to other modalities, highlighting the practical impact of incorporating anatomical priors into SSL for medical imaging. Overall, ACE advances annotation-efficient, anatomically grounded representations that improve robustness, interpretability, and transferability in clinical image analysis.

Abstract

Medical images acquired from standardized protocols show consistent macroscopic or microscopic anatomical structures, and these structures consist of composable/decomposable organs and tissues, but existing self-supervised learning (SSL) methods do not appreciate such composable/decomposable structure attributes inherent to medical images. To overcome this limitation, this paper introduces a novel SSL approach called ACE to learn anatomically consistent embedding via composition and decomposition with two key branches: (1) global consistency, capturing discriminative macro-structures via extracting global features; (2) local consistency, learning fine-grained anatomical details from composable/decomposable patch features via corresponding matrix matching. Experimental results across 6 datasets 2 backbones, evaluated in few-shot learning, fine-tuning, and property analysis, show ACE's superior robustness, transferability, and clinical potential. The innovations of our ACE lie in grid-wise image cropping, leveraging the intrinsic properties of compositionality and decompositionality of medical images, bridging the semantic gap from high-level pathologies to low-level tissue anomalies, and providing a new SSL method for medical imaging.
Paper Structure (23 sections, 8 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 8 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: (a) Chest X-rays contain various large (global) and small (local) anatomical patterns, including the right/left lung, heart, spinous processes, clavicle, mainstem bronchus, and the osseous structures of the thorax, which can be utilized for learning global and local embeddings in anatomy. (b) The hierarchical nature of anatomy (eg. The left lung has two lobes, the superior lobe $x$ and the inferior lobe $y$) calls for anatomical representation with compositionality where the embedding of the whole patch should be the sum of the embeddings of each part.
  • Figure 2: ACE learns anatomically consistent embedding with two key branches: (1) global consistency and (2) local consistency via composition decomposition. Using our proposed grid-wise image cropping strategy (detailed in Sec. \ref{['subsec:grid']}), an input image is divided into a grid (see white grids), and two random crops, $C_1$ (yellow grids) and $C_2$ (green grids), are extracted. In the overlap region between $C_1$ and $C_2$, four patches in $C_1$ (denoted as $q_1, q_2, q_3, q_4$) correspond to one patch in $C_2$ (denoted as $p = {q_1, q_2, q_3, q_4}$). The global consistency branch (detailed in Sec. \ref{['subsec:global']}) enforces consistency between the embeddings of the overlapping regions in $C_1$ and $C_2$ to learn coarse-grained semantic features. The local consistency branch (detailed in Sec. \ref{['subsec:local_consistency']}) enforces the model to learn fine-grained anatomical structure details via composition and decomposition. The local composition branch maximizes the similarity of paired patch embeddings and minimizes the similarity of unpaired ones to learn fine-grained anatomies in a part-to-whole manner. In a symmetrical process, the local decomposition branch enforces the model to learn fine-grained anatomies in a whole-to-part manner.
  • Figure 3: ACE preserves the compositionality of anatomical structures in its learned embedding space. As seen, ACE's distribution is narrower and taller compared with DINO caron2021emerging, DropPos wang2023droppos and SelfPatch yun2022patch, with the mean similarity between embeddings of patches and their compositional parts closer to 1.
  • Figure 4: ACE preserves the decompositionality of anatomical structures in its learned embedding space. As seen, ACE outperforms SSL baselines by a large margin, achieving a high accuracy of 89.01%, which is 30% higher than the second-best baseline.
  • Figure 5: ACE captures semantics-rich features in its learned embedding space. As seen, ACE achieves higher retrieval accuracy compared with other SSL baselines.
  • ...and 8 more figures