Table of Contents
Fetching ...

Glass Segmentation with Multi Scales and Primary Prediction Guiding

Zhiyu Xu, Qingliang Chen

TL;DR

This work tackles RGB-based glass-like object segmentation, a task hindered by transparency and reflective boundaries. It introduces MGNet, a multi-scale framework featuring Fine-Rescaling and Merging (FRM), Hierarchical Channel-Down Decoder (HCDD), and Primary Prediction Guiding (PPG), augmented by an Uncertainty-Aware Loss (UAL) to suppress ambiguous regions. By processing multi-scale inputs (0.7x, 1.0x, 1.2x) with a shared ResNeXt101 backbone and refining coarse predictions through PPG, MGNet achieves state-of-the-art performance on Trans10k, GSD, and PMD under consistent training settings, highlighting improved transferability and RGB-only applicability. The approach reduces reliance on boundary cues or auxiliary modalities, enabling robust, practical glass-like object segmentation in real-world scenes.

Abstract

Glass-like objects can be seen everywhere in our daily life which are very hard for existing methods to segment them. The properties of transparencies pose great challenges of detecting them from the chaotic background and the vague separation boundaries further impede the acquisition of their exact contours. Moving machines which ignore glasses have great risks of crashing into transparent barriers or difficulties in analysing objects reflected in the mirror, thus it is of substantial significance to accurately locate glass-like objects and completely figure out their contours. In this paper, inspired by the scale integration strategy and the refinement method, we proposed a brand-new network, named as MGNet, which consists of a Fine-Rescaling and Merging module (FRM) to improve the ability to extract spatially relationship and a Primary Prediction Guiding module (PPG) to better mine the leftover semantics from the fused features. Moreover, we supervise the model with a novel loss function with the uncertainty-aware loss to produce high-confidence segmentation maps. Unlike the existing glass segmentation models that must be trained on different settings with respect to varied datasets, our model are trained under consistent settings and has achieved superior performance on three popular public datasets. Code is available at

Glass Segmentation with Multi Scales and Primary Prediction Guiding

TL;DR

This work tackles RGB-based glass-like object segmentation, a task hindered by transparency and reflective boundaries. It introduces MGNet, a multi-scale framework featuring Fine-Rescaling and Merging (FRM), Hierarchical Channel-Down Decoder (HCDD), and Primary Prediction Guiding (PPG), augmented by an Uncertainty-Aware Loss (UAL) to suppress ambiguous regions. By processing multi-scale inputs (0.7x, 1.0x, 1.2x) with a shared ResNeXt101 backbone and refining coarse predictions through PPG, MGNet achieves state-of-the-art performance on Trans10k, GSD, and PMD under consistent training settings, highlighting improved transferability and RGB-only applicability. The approach reduces reliance on boundary cues or auxiliary modalities, enabling robust, practical glass-like object segmentation in real-world scenes.

Abstract

Glass-like objects can be seen everywhere in our daily life which are very hard for existing methods to segment them. The properties of transparencies pose great challenges of detecting them from the chaotic background and the vague separation boundaries further impede the acquisition of their exact contours. Moving machines which ignore glasses have great risks of crashing into transparent barriers or difficulties in analysing objects reflected in the mirror, thus it is of substantial significance to accurately locate glass-like objects and completely figure out their contours. In this paper, inspired by the scale integration strategy and the refinement method, we proposed a brand-new network, named as MGNet, which consists of a Fine-Rescaling and Merging module (FRM) to improve the ability to extract spatially relationship and a Primary Prediction Guiding module (PPG) to better mine the leftover semantics from the fused features. Moreover, we supervise the model with a novel loss function with the uncertainty-aware loss to produce high-confidence segmentation maps. Unlike the existing glass segmentation models that must be trained on different settings with respect to varied datasets, our model are trained under consistent settings and has achieved superior performance on three popular public datasets. Code is available at
Paper Structure (14 sections, 8 equations, 7 figures, 4 tables)

This paper contains 14 sections, 8 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Visualization of our predicted results. Compared with the ground truths, our segmentation maps exhibit fine, complete, and strong discriminative properties, which is mainly contributed by our combination of multi-scales and refinement strategies. The predictions of those ambiguous regions also gain the help from UAL.
  • Figure 2: The overall framework of MGNet. The fine-rescaling and merging module (FRM) is adopted to integrate full-channel features of different levels extracted by a shared encoder to mine the critical clues from different scales. The hierarchical channel-down decoder (HCDD) further enhances the feature discrimination by constructing a multi-path structure inside the features while reducing the channel of features. Then, a coarse probability map is generated and sent to the primary prediction guiding module (PPG) to complete the coarse-to-fine process for mining leftover semantics and uncertainty elimination. After refinement, a final probability map of the camouflaged object on the input image can be obtained. The times of guiding can be adjusted according to practical needs.
  • Figure 3: Illustration of the full scale merging module (FRM).
  • Figure 4: Illustration of the hierarchical channel-down unit (HCDU) .
  • Figure 5: Illustration of the primary prediction guiding module (PPG).
  • ...and 2 more figures