Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

Nicolás Gaggion; Maria J. Ledesma-Carbayo; Stergios Christodoulidis; Maria Vakalopoulou; Enzo Ferrante

Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

Nicolás Gaggion, Maria J. Ledesma-Carbayo, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante

TL;DR

Mask-HybridGNet is introduced, a framework that trains graph-based models directly using standard pixel-wise masks, eliminating the need for manual landmark annotations and ensuring anatomical plausibility by enforcing boundary connectivity through a fixed graph adjacency matrix.

Abstract

Graph-based medical image segmentation represents anatomical structures using boundary graphs, providing fixed-topology landmarks and inherent population-level correspondences. However, their clinical adoption has been hindered by a major requirement: training datasets with manually annotated landmarks that maintain point-to-point correspondences across patients rarely exist in practice. We introduce Mask-HybridGNet, a framework that trains graph-based models directly using standard pixel-wise masks, eliminating the need for manual landmark annotations. Our approach aligns variable-length ground truth boundaries with fixed-length landmark predictions by combining Chamfer distance supervision and edge-based regularization to ensure local smoothness and regular landmark distribution, further refined via differentiable rasterization. A significant emergent property of this framework is that predicted landmark positions become consistently associated with specific anatomical locations across patients without explicit correspondence supervision. This implicit atlas learning enables temporal tracking, cross-slice reconstruction, and morphological population analyses. Beyond direct segmentation, Mask-HybridGNet can extract correspondences from existing segmentation masks, allowing it to generate stable anatomical atlases from any high-quality pixel-based model. Experiments across chest radiography, cardiac ultrasound, cardiac MRI, and fetal imaging demonstrate that our model achieves competitive results against state-of-the-art pixel-based methods, while ensuring anatomical plausibility by enforcing boundary connectivity through a fixed graph adjacency matrix. This framework leverages the vast availability of standard segmentation masks to build structured models that maintain topological integrity and provide implicit correspondences.

Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

TL;DR

Abstract

Paper Structure (43 sections, 24 equations, 10 figures, 10 tables)

This paper contains 43 sections, 24 equations, 10 figures, 10 tables.

Introduction
Related Work
Pixel-based Anatomical Segmentation
Landmark and Graph-Based Anatomical Segmentation
Contour-Based Methods with Differentiable Rendering
Positioning Our Contribution
Mask-HybridGNet: Learning Implicit Anatomical Correspondences for Graph-based Medical Image Segmentation
Problem Formulation
Adjacency Matrix Construction
Graph Neural Network Architecture
Loss Function: Data Terms
Loss Function: Regularization Terms
I) Uniform Edge Length Loss:
II) Elasticity Loss:
III) Curvature Loss:
...and 28 more sections

Figures (10)

Figure 1: Framework overview. Mask-HybridGNet enables training graph-based segmentation models with standard pixel-level supervision, without manual landmark annotations. (Left) Given a set of medical imaging datasets with potentially heterogeneous pixel-level masks, we first automatically extract variable-length contour pixels from the masks. (Middle) Graph structures are generated from dataset statistics, which determine fixed landmark counts and construct adjacency matrices. This allows for training our Mask-HybridGNet models. (Right) Our model produces fixed-topology boundary graphs where landmark indices implicitly represent consistent anatomical locations across patients, without explicit correspondence supervision.
Figure 2: Architectural overview of the Mask-HybridGNet framework. (Left) The standard Mask-HybridGNet architecture couples a CNN encoder with a graph decoder via Image-to-Graph Skip Connections (IGSC). (Right) The Dual variant introduces an auxiliary CNN decoder following a U-Net topology. In this setup, the IGSC layers are redirected to sample feature maps from the auxiliary decoder instead of the encoder. Both variants share the same graph decoder, utilize a variational bottleneck for shape modeling, and are trained using a combination of Chamfer distance, edge-based regularization, and differentiable rasterization. The Dual model additionally incorporates a pixel-wise loss for the auxiliary decoder branch.
Figure 3: Graph representations for chest X-ray and echocardiograph anatomical structures. The figure illustrates the two graph representation strategies employed in our framework. Left: Independent graph representations treat each anatomical structure separately with circular graphs. For chest X-ray: left lung (blue), right lung (green), heart (orange), left clavicle (cyan), and right clavicle (purple). For echocardiography: left ventricular endocardium (red), left ventricular epicardium (green), and left atrium (blue). Right: Unified graph representations model share anatomical boundaries, where nodes can belong to multiple organs. Cyan contour segments indicate shared interfaces between adjacent structures, enabling joint modeling of anatomically connected regions.
Figure 4: Emergent anatomical correspondences learned through mask supervision. The figure demonstrates that our framework successfully learns anatomically meaningful landmark correspondences without explicit point-level supervision. The left column shows the ground-truth contours, middle column results from independent graph representations, while the right column show unified graph representations. Each row displays a subset of organs for two different subjects, revealing consistent anatomical locations on a subset of equidistant nodes.
Figure 5: Temporal cardiac analysis demonstrating landmark correspondence tracking using a unified graph representation. The figure displays comprehensive cardiac segmentation and tracking for a representative test patient with normal systolic function (EF = 54%) across two-chamber (top two rows) and four-chamber (bottom two rows) views. For each view, the first row compares the model prediction against the ground truth at end-diastole and end-systole. The second row visualizes the temporal motion tracking of landmarks for the left ventricular endocardium (LV Endo), epicardium (LV Epi), and left atrium (LA).
...and 5 more figures

Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

TL;DR

Abstract

Mask-HybridGNet: Graph-based segmentation with emergent anatomical correspondence from pixel-level supervision

Authors

TL;DR

Abstract

Table of Contents

Figures (10)