Table of Contents
Fetching ...

SAUGE: Taming SAM for Uncertainty-Aligned Multi-Granularity Edge Detection

Xing Liufu, Chaolei Tan, Xiaotong Lin, Yonggang Qi, Jinxuan Li, Jian-Fang Hu

TL;DR

SAUGE tackles uncertainty in edge labeling arising from multi-annotator variability by aligning edge granularity with data-driven uncertainty. It builds on a frozen SAM backbone by introducing a lightweight Side Transfer Network that progressively fuses intermediate SAM features to produce coarse-to-fine edge maps, supervised with pseudo labels at varied granularities and a diversity-promoting loss. The method enables edge outputs at arbitrary granularity with strong cross-dataset generalization, achieving state-of-the-art results on BSDS500, Multicue, and NYUDv2 while maintaining parameter efficiency. This approach advances practical edge detection by explicitly modeling uncertainty through granularity-aware, SAM-informed representations.

Abstract

Edge labels are typically at various granularity levels owing to the varying preferences of annotators, thus handling the subjectivity of per-pixel labels has been a focal point for edge detection. Previous methods often employ a simple voting strategy to diminish such label uncertainty or impose a strong assumption of labels with a pre-defined distribution, e.g., Gaussian. In this work, we unveil that the segment anything model (SAM) provides strong prior knowledge to model the uncertainty in edge labels. Our key insight is that the intermediate SAM features inherently correspond to object edges at various granularities, which reflects different edge options due to uncertainty. Therefore, we attempt to align uncertainty with granularity by regressing intermediate SAM features from different layers to object edges at multi-granularity levels. In doing so, the model can fully and explicitly explore diverse ``uncertainties'' in a data-driven fashion. Specifically, we inject a lightweight module (~ 1.5% additional parameters) into the frozen SAM to progressively fuse and adapt its intermediate features to estimate edges from coarse to fine. It is crucial to normalize the granularity level of human edge labels to match their innate uncertainty. For this, we simply perform linear blending to the real edge labels at hand to create pseudo labels with varying granularities. Consequently, our uncertainty-aligned edge detector can flexibly produce edges at any desired granularity (including an optimal one). Thanks to SAM, our model uniquely demonstrates strong generalizability for cross-dataset edge detection. Extensive experimental results on BSDS500, Muticue and NYUDv2 validate our model's superiority.

SAUGE: Taming SAM for Uncertainty-Aligned Multi-Granularity Edge Detection

TL;DR

SAUGE tackles uncertainty in edge labeling arising from multi-annotator variability by aligning edge granularity with data-driven uncertainty. It builds on a frozen SAM backbone by introducing a lightweight Side Transfer Network that progressively fuses intermediate SAM features to produce coarse-to-fine edge maps, supervised with pseudo labels at varied granularities and a diversity-promoting loss. The method enables edge outputs at arbitrary granularity with strong cross-dataset generalization, achieving state-of-the-art results on BSDS500, Multicue, and NYUDv2 while maintaining parameter efficiency. This approach advances practical edge detection by explicitly modeling uncertainty through granularity-aware, SAM-informed representations.

Abstract

Edge labels are typically at various granularity levels owing to the varying preferences of annotators, thus handling the subjectivity of per-pixel labels has been a focal point for edge detection. Previous methods often employ a simple voting strategy to diminish such label uncertainty or impose a strong assumption of labels with a pre-defined distribution, e.g., Gaussian. In this work, we unveil that the segment anything model (SAM) provides strong prior knowledge to model the uncertainty in edge labels. Our key insight is that the intermediate SAM features inherently correspond to object edges at various granularities, which reflects different edge options due to uncertainty. Therefore, we attempt to align uncertainty with granularity by regressing intermediate SAM features from different layers to object edges at multi-granularity levels. In doing so, the model can fully and explicitly explore diverse ``uncertainties'' in a data-driven fashion. Specifically, we inject a lightweight module (~ 1.5% additional parameters) into the frozen SAM to progressively fuse and adapt its intermediate features to estimate edges from coarse to fine. It is crucial to normalize the granularity level of human edge labels to match their innate uncertainty. For this, we simply perform linear blending to the real edge labels at hand to create pseudo labels with varying granularities. Consequently, our uncertainty-aligned edge detector can flexibly produce edges at any desired granularity (including an optimal one). Thanks to SAM, our model uniquely demonstrates strong generalizability for cross-dataset edge detection. Extensive experimental results on BSDS500, Muticue and NYUDv2 validate our model's superiority.

Paper Structure

This paper contains 17 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison of edges obtained by SAM and manually annotated ground truth: (a) shows images from BSDS set; (b) illustrates edge maps generated by SAM; (c) presents edge annotations; (d) compares (b) and (c), where green indicates the shared edges, red represents edges in (b) but not in (c), and blue represents edges in (c) but not in (b).
  • Figure 2: The overall framework of SAUGE. (a) illustrates the pipeline of SAUGE. We extract the intermediate SAM features and feed them into STN, which constructs edges at multiple granularity levels to align uncertainty with granularity. The final output $\hat{Y}^u$ is obtained by merging the features of side outputs. We devise losses $\{L_{side}, L_{differ}, L_{guide}\}$ to supervise the side outputs, promote the pairwise diversity among them, and guide edge learning using SAM masks, respectively. (b) shows the features extracted from SAM. (c) demonstrates the generation of $\hat{Y}^\alpha$ at any granularity level $\alpha$ in a controllable manner.
  • Figure 3: Qualitative comparison results on BSDS test set. * indicates the Zero Shot method.
  • Figure 4: Precision-recall curves for the BSDS500 test set.
  • Figure 5: Qualitative comparison with MuGE for different edge granularities $\alpha$ under the SS-VOC setting.
  • ...and 1 more figures