Table of Contents
Fetching ...

Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation

Kristin Qi, Xinhan Di

TL;DR

The paper tackles OCT retinal image segmentation by leveraging both spatial and spectral information, which previous methods typically treat in isolation. It introduces a two-stage triple-encoder network with CNN, Fast Fourier Convolution, and a CNN-former to build spatial and spectral representations, followed by cross-attention-based fusion. The model achieves a Dice score of 0.864 on the Duke OCT dataset, outperforming prior work, with strong performance on fluid and ILM layers; ablation confirms the importance of cross-attention. This cross-domain fusion approach could improve automated retinal structure segmentation and aid diagnostic workflows, warranting validation on additional datasets.

Abstract

Retinal Optical Coherence Tomography (OCT) segmentation is essential for diagnosing pathology. Traditional methods focus on either spatial or spectral domains, overlooking their combined dependencies. We propose a triple-encoder network that integrates CNNs for spatial features, Fast Fourier Convolution (FFC) for spectral features, and attention mechanisms to capture global relationships across both domains. Attention fusion modules integrate convolution and cross-attention to further enhance features. Our method achieves an average Dice score improvement from 0.855 to 0.864, outperforming prior work.

Attentional Triple-Encoder Network in Spatiospectral Domains for Medical Image Segmentation

TL;DR

The paper tackles OCT retinal image segmentation by leveraging both spatial and spectral information, which previous methods typically treat in isolation. It introduces a two-stage triple-encoder network with CNN, Fast Fourier Convolution, and a CNN-former to build spatial and spectral representations, followed by cross-attention-based fusion. The model achieves a Dice score of 0.864 on the Duke OCT dataset, outperforming prior work, with strong performance on fluid and ILM layers; ablation confirms the importance of cross-attention. This cross-domain fusion approach could improve automated retinal structure segmentation and aid diagnostic workflows, warranting validation on additional datasets.

Abstract

Retinal Optical Coherence Tomography (OCT) segmentation is essential for diagnosing pathology. Traditional methods focus on either spatial or spectral domains, overlooking their combined dependencies. We propose a triple-encoder network that integrates CNNs for spatial features, Fast Fourier Convolution (FFC) for spectral features, and attention mechanisms to capture global relationships across both domains. Attention fusion modules integrate convolution and cross-attention to further enhance features. Our method achieves an average Dice score improvement from 0.855 to 0.864, outperforming prior work.

Paper Structure

This paper contains 4 sections, 2 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: (A) Overview: multi-head cross-attention (MHCA) fuses features from three encoder branches. (B) Details of the CNN-former, consisting of four sub-blocks for feature enhancement. (C) Details of the MHCA, showing how bidirectional cross-attention processes two feature sets ($F_L$ and $F_R$).