Table of Contents
Fetching ...

L2GNet: Optimal Local-to-Global Representation of Anatomical Structures for Generalized Medical Image Segmentation

Vandan Gorade, Sparsh Mittal, Neethi Dasu, Rekha Singhal, KC Santosh, Debesh Jha

TL;DR

L2GNet addresses the challenge of modeling long-range, anatomically coherent dependencies in medical image segmentation by replacing conventional dot-product self-attention with a discrete optimal transport-based mechanism. It maps continuous latent representations to discrete codes, aligns them to a learnable reference via Sinkhorn-based OT in an RKHS, and performs a weighted pooling to produce a global-aware embedding used by a decoder. The method introduces a local-to-global mapper (L2GMapper) and leverages multiple references to capture pertinent regional relations, achieving superior performance over CLS, DLS, and CDLS baselines on Synapse and ACDC datasets while reducing computational burden. This approach offers annotation-efficient, generalizable segmentation with robust handling of intra-class and inter-class anatomical dependencies, and opens avenues for 3D extensions and integration with larger foundational models.

Abstract

Continuous Latent Space (CLS) and Discrete Latent Space (DLS) models, like AttnUNet and VQUNet, have excelled in medical image segmentation. In contrast, Synergistic Continuous and Discrete Latent Space (CDLS) models show promise in handling fine and coarse-grained information. However, they struggle with modeling long-range dependencies. CLS or CDLS-based models, such as TransUNet or SynergyNet are adept at capturing long-range dependencies. Since they rely heavily on feature pooling or aggregation using self-attention, they may capture dependencies among redundant regions. This hinders comprehension of anatomical structure content, poses challenges in modeling intra-class and inter-class dependencies, increases false negatives and compromises generalization. Addressing these issues, we propose L2GNet, which learns global dependencies by relating discrete codes obtained from DLS using optimal transport and aligning codes on a trainable reference. L2GNet achieves discriminative on-the-fly representation learning without an additional weight matrix in self-attention models, making it computationally efficient for medical applications. Extensive experiments on multi-organ segmentation and cardiac datasets demonstrate L2GNet's superiority over state-of-the-art methods, including the CDLS method SynergyNet, offering an novel approach to enhance deep learning models' performance in medical image analysis.

L2GNet: Optimal Local-to-Global Representation of Anatomical Structures for Generalized Medical Image Segmentation

TL;DR

L2GNet addresses the challenge of modeling long-range, anatomically coherent dependencies in medical image segmentation by replacing conventional dot-product self-attention with a discrete optimal transport-based mechanism. It maps continuous latent representations to discrete codes, aligns them to a learnable reference via Sinkhorn-based OT in an RKHS, and performs a weighted pooling to produce a global-aware embedding used by a decoder. The method introduces a local-to-global mapper (L2GMapper) and leverages multiple references to capture pertinent regional relations, achieving superior performance over CLS, DLS, and CDLS baselines on Synapse and ACDC datasets while reducing computational burden. This approach offers annotation-efficient, generalizable segmentation with robust handling of intra-class and inter-class anatomical dependencies, and opens avenues for 3D extensions and integration with larger foundational models.

Abstract

Continuous Latent Space (CLS) and Discrete Latent Space (DLS) models, like AttnUNet and VQUNet, have excelled in medical image segmentation. In contrast, Synergistic Continuous and Discrete Latent Space (CDLS) models show promise in handling fine and coarse-grained information. However, they struggle with modeling long-range dependencies. CLS or CDLS-based models, such as TransUNet or SynergyNet are adept at capturing long-range dependencies. Since they rely heavily on feature pooling or aggregation using self-attention, they may capture dependencies among redundant regions. This hinders comprehension of anatomical structure content, poses challenges in modeling intra-class and inter-class dependencies, increases false negatives and compromises generalization. Addressing these issues, we propose L2GNet, which learns global dependencies by relating discrete codes obtained from DLS using optimal transport and aligning codes on a trainable reference. L2GNet achieves discriminative on-the-fly representation learning without an additional weight matrix in self-attention models, making it computationally efficient for medical applications. Extensive experiments on multi-organ segmentation and cardiac datasets demonstrate L2GNet's superiority over state-of-the-art methods, including the CDLS method SynergyNet, offering an novel approach to enhance deep learning models' performance in medical image analysis.

Paper Structure

This paper contains 5 sections, 5 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Illustrates the workflow of Proposed L2GNet.
  • Figure 2: Segmentation maps on Synapse and ACDC datasets are shown with color-code (First three rows, yellow: liver, blue: right kidney, green: left kidney, light blue: pancreas. Last row, blue, purple, and yellow represent the RV, LV, and MYO, respectively.)