Table of Contents
Fetching ...

Image-level Regression for Uncertainty-aware Retinal Image Segmentation

Trung Dang, Huy Hoang Nguyen, Aleksei Tiulpin

TL;DR

This work addresses retinal vessel segmentation under annotator uncertainty by reframing the task as image-level regression. It introduces SAUNA, a transform that converts hard ground-truth masks into signed soft labels by incorporating boundary proximity and vessel thickness, enabling uncertainty-aware learning without multiple annotations. The method combines a generalized Jaccard Metric Loss operating on a [-1,1]^D domain with a stable Focal-L1 loss for pixel-level regression, yielding strong performance with LR inputs while maintaining high efficiency compared to HR baselines. Across five retinal datasets, the approach improves IoU, Dice, and accuracy, and demonstrates superior generalization to external datasets, especially when paired with UNet-like architectures. The results suggest that uncertainty-aware, regression-based segmentation can achieve competitive or superior performance with much greater throughput, aided by the SAUNA transform and robust loss design.

Abstract

Accurate retinal vessel (RV) segmentation is a crucial step in the quantitative assessment of retinal vasculature, which is needed for the early detection of retinal diseases and other conditions. Numerous studies have been conducted to tackle the problem of segmenting vessels automatically using a pixel-wise classification approach. The common practice of creating ground truth labels is to categorize pixels as foreground and background. This approach is, however, biased, and it ignores the uncertainty of a human annotator when it comes to annotating e.g. thin vessels. In this work, we propose a simple and effective method that casts the RV segmentation task as an image-level regression. For this purpose, we first introduce a novel Segmentation Annotation Uncertainty-Aware (SAUNA) transform, which adds pixel uncertainty to the ground truth using the pixel's closeness to the annotation boundary and vessel thickness. To train our model with soft labels, we generalize the earlier proposed Jaccard metric loss to arbitrary hypercubes for soft Jaccard index (Intersection-over-Union) optimization. Additionally, we employ a stable version of the Focal-L1 loss for pixel-wise regression. We conduct thorough experiments and compare our method to a diverse set of baselines across 5 retinal image datasets. Our empirical results indicate that the integration of the SAUNA transform and these segmentation losses led to significant performance boosts for different segmentation models. Particularly, our methodology enables UNet-like architectures to substantially outperform computational-intensive baselines. Our implementation is available at \url{https://github.com/Oulu-IMEDS/SAUNA}.

Image-level Regression for Uncertainty-aware Retinal Image Segmentation

TL;DR

This work addresses retinal vessel segmentation under annotator uncertainty by reframing the task as image-level regression. It introduces SAUNA, a transform that converts hard ground-truth masks into signed soft labels by incorporating boundary proximity and vessel thickness, enabling uncertainty-aware learning without multiple annotations. The method combines a generalized Jaccard Metric Loss operating on a [-1,1]^D domain with a stable Focal-L1 loss for pixel-level regression, yielding strong performance with LR inputs while maintaining high efficiency compared to HR baselines. Across five retinal datasets, the approach improves IoU, Dice, and accuracy, and demonstrates superior generalization to external datasets, especially when paired with UNet-like architectures. The results suggest that uncertainty-aware, regression-based segmentation can achieve competitive or superior performance with much greater throughput, aided by the SAUNA transform and robust loss design.

Abstract

Accurate retinal vessel (RV) segmentation is a crucial step in the quantitative assessment of retinal vasculature, which is needed for the early detection of retinal diseases and other conditions. Numerous studies have been conducted to tackle the problem of segmenting vessels automatically using a pixel-wise classification approach. The common practice of creating ground truth labels is to categorize pixels as foreground and background. This approach is, however, biased, and it ignores the uncertainty of a human annotator when it comes to annotating e.g. thin vessels. In this work, we propose a simple and effective method that casts the RV segmentation task as an image-level regression. For this purpose, we first introduce a novel Segmentation Annotation Uncertainty-Aware (SAUNA) transform, which adds pixel uncertainty to the ground truth using the pixel's closeness to the annotation boundary and vessel thickness. To train our model with soft labels, we generalize the earlier proposed Jaccard metric loss to arbitrary hypercubes for soft Jaccard index (Intersection-over-Union) optimization. Additionally, we employ a stable version of the Focal-L1 loss for pixel-wise regression. We conduct thorough experiments and compare our method to a diverse set of baselines across 5 retinal image datasets. Our empirical results indicate that the integration of the SAUNA transform and these segmentation losses led to significant performance boosts for different segmentation models. Particularly, our methodology enables UNet-like architectures to substantially outperform computational-intensive baselines. Our implementation is available at \url{https://github.com/Oulu-IMEDS/SAUNA}.
Paper Structure (27 sections, 7 theorems, 23 equations, 8 figures, 3 tables)

This paper contains 27 sections, 7 theorems, 23 equations, 8 figures, 3 tables.

Key Result

Proposition 1

$\Delta_{\mathrm{JML}}$ is a semi-metric in $[\alpha, \beta]^{D} \subseteq \mathbb{R}^D$. Specifically, $\forall \mathbf{a}, \mathbf{b} \in [\alpha, \beta]^{D}$, we have

Figures (8)

  • Figure 1: Comparisons of performance and throughput across methods using high-resolution (HR) and low-resolution (LR) inputs on the FIVES test set. The x-axis is in the log scale. The red line indicates the best result among the baselines. "Conventional" indicates baselines using binary masks (hard labels). The corresponding quantitative results are in \ref{['tab:exp_arch_comparisons']}.
  • Figure 2: Our workflow of image-level regression for retinal image segmentation. Our primary contributions are the SAUNA transform (see \ref{['sc:sauna']}), an extension of the Jaccard metric loss wang2023jaccard (see \ref{['sc:gjml']}), and a stable version of the Focal-L1 loss dang2024singr (see \ref{['sc:focal_l1']}).
  • Figure 3: Illustration of the transformation from a 0-1 ground truth (GT) mask to its associated SAUNA map (best viewed in color): (a) 2D GT mask with an orange projection, (b) corresponding transformations in the 1D projection.
  • Figure 4: 2D loss surfaces of the Focal-L1 loss and its stable version with $\gamma=1$. Colors represent loss magnitudes. The dashed yellow lines in (a-b) indicate the projections shown in (c).
  • Figure 5: Performance gains of different DL architectures utilizing our approach with LR inputs. The dashed red line indicates the performance of the most competitive HR-based baseline, MAGF-Net li2023magf.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Proposition 1: Jaccard Metric Loss on a hypercube in $\mathbb{R}^D$
  • proof
  • Proposition 2: Stable Focal-L1
  • proof
  • Proposition 3: Stable Focal-L1 as a lower bound of Focal-L1
  • proof
  • Proposition 1: Jaccard Metric Loss on a hypercube in $\mathbb{R}^D$
  • proof
  • Lemma 2.1
  • proof
  • ...and 4 more