Table of Contents
Fetching ...

H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images

Jinpeng Lu, Jingyun Chen, Linghan Cai, Songhan Jiang, Yongbing Zhang

TL;DR

This work tackles automatic tumor segmentation in PET/CT by addressing suboptimal fusion of PET and CT features. It introduces H2ASeg, a hierarchical network that combines Modality-Cooperative Spatial Attention (MCSA) and Target-Aware Modality Weighting (TAMW) to model cross-modal dependencies and emphasize tumor-relevant information. Experiments on AutoPET-II and Hecktor2022 demonstrate state-of-the-art Dice scores and ablations confirm the complementary benefits of MCSA and TAMW. The approach provides a modular, efficient path to exploit modality complementarity, with potential to enhance clinical workflow.

Abstract

Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/JinPLu/H2ASeg.

H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT Images

TL;DR

This work tackles automatic tumor segmentation in PET/CT by addressing suboptimal fusion of PET and CT features. It introduces H2ASeg, a hierarchical network that combines Modality-Cooperative Spatial Attention (MCSA) and Target-Aware Modality Weighting (TAMW) to model cross-modal dependencies and emphasize tumor-relevant information. Experiments on AutoPET-II and Hecktor2022 demonstrate state-of-the-art Dice scores and ablations confirm the complementary benefits of MCSA and TAMW. The approach provides a modular, efficient path to exploit modality complementarity, with potential to enhance clinical workflow.

Abstract

Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/JinPLu/H2ASeg.
Paper Structure (13 sections, 5 equations, 6 figures, 1 table)

This paper contains 13 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Overview of the proposed H2ASeg, which is an encoder-decoder framework. MCSA modules perform modality interaction in the encoder. TAMW modules adjust multi-modal features for refining segmentation.
  • Figure 2: Structure of modality-cooperative spatial attention module. (a) presents the overall architecture. (b) shows a bi-directional spatial attention mechanism. "Win." is the abbreviation of window, "Proj." is projection, "Mul." means multiply, "Concat. & Conv." denotes concatenation and convolution.
  • Figure 3: Qualitative results of different methods. Boxes in PET highlight areas that are easily misclassified. For segmentation results, the red area is the true positive, the blue is the false negative, and the yellow is the false positive.
  • Figure 4: Effects of MCSA and TAMW on AutoPet-II.
  • Figure 5: Effects (%) of PET/CT in TAMW for emphasis at different depths.
  • ...and 1 more figures