Table of Contents
Fetching ...

Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2

Olivier Morelle, Justus Bisten, Maximilian W. M. Wintergerst, Robert P. Finger, Thomas Schultz

TL;DR

Hyper-reflective foci (HRF) in optical coherence tomography (OCT) are small, diagnostically relevant spots that are difficult to segment with traditional weakly supervised methods due to downsampling. The authors present a high-resolution weakly supervised framework that combines attention-based Multiple Instance Learning (MIL) with Layer-wise Relevance Propagation (LRP) to prompt Segment Anything Model 2 (SAM 2), plus an iterative inference loop to recover multiple HRFs; they also evaluate a Compact Convolutional Transformer (CCT) as an alternative to MIL that preserves spatial details. Results show that CCT consistently outperforms MIL, and SAM 2 prompts derived from relevance maps achieve competitive Dice scores (~0.33) after several iterations, approaching an oracle (~0.35). The approach enables accurate HRF segmentation at full resolution with minimal annotation effort and has potential applicability to other small-structure biomarkers in OCT.

Abstract

Weakly supervised segmentation has the potential to greatly reduce the annotation effort for training segmentation models for small structures such as hyper-reflective foci (HRF) in optical coherence tomography (OCT). However, most weakly supervised methods either involve a strong downsampling of input images, or only achieve localization at a coarse resolution, both of which are unsatisfactory for small structures. We propose a novel framework that increases the spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach by using Layer-wise Relevance Propagation (LRP) to prompt the Segment Anything Model (SAM~2), and increases recall with iterative inference. Moreover, we demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT), which adds a positional encoding, and permits an exchange of information between different regions of the OCT image, leads to a further and substantial increase in segmentation accuracy.

Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2

TL;DR

Hyper-reflective foci (HRF) in optical coherence tomography (OCT) are small, diagnostically relevant spots that are difficult to segment with traditional weakly supervised methods due to downsampling. The authors present a high-resolution weakly supervised framework that combines attention-based Multiple Instance Learning (MIL) with Layer-wise Relevance Propagation (LRP) to prompt Segment Anything Model 2 (SAM 2), plus an iterative inference loop to recover multiple HRFs; they also evaluate a Compact Convolutional Transformer (CCT) as an alternative to MIL that preserves spatial details. Results show that CCT consistently outperforms MIL, and SAM 2 prompts derived from relevance maps achieve competitive Dice scores (~0.33) after several iterations, approaching an oracle (~0.35). The approach enables accurate HRF segmentation at full resolution with minimal annotation effort and has potential applicability to other small-structure biomarkers in OCT.

Abstract

Weakly supervised segmentation has the potential to greatly reduce the annotation effort for training segmentation models for small structures such as hyper-reflective foci (HRF) in optical coherence tomography (OCT). However, most weakly supervised methods either involve a strong downsampling of input images, or only achieve localization at a coarse resolution, both of which are unsatisfactory for small structures. We propose a novel framework that increases the spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach by using Layer-wise Relevance Propagation (LRP) to prompt the Segment Anything Model (SAM~2), and increases recall with iterative inference. Moreover, we demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT), which adds a positional encoding, and permits an exchange of information between different regions of the OCT image, leads to a further and substantial increase in segmentation accuracy.
Paper Structure (13 sections, 1 figure, 2 tables)

This paper contains 13 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1.1: Example segmentations from the test set. The ground truth segmentation is shown on top and the output of our best performing weakly supervised segmentation model on the bottom.