GEM: Boost Simple Network for Glass Surface Segmentation via Segment Anything Model and Data Synthesis
Jing Hao, Moyun Liu, Kuo Feng Hung
TL;DR
This work addresses the challenging task of segmenting glass surfaces, whose transparency and reflections yield ambiguous boundaries. It proposes GEM, a lightweight, SAM-based segmentation framework with a discerning query selection module, coupled with S-GSD, a large-scale synthetic glass dataset generated via ControlNet and Stable Diffusion for transfer learning. Empirical results show GEM achieving state-of-the-art performance on GSD-S (IoU improvements up to +2.1%) and benefiting from synthetic pretraining, with further gains in zero-shot and finetuning when using S-GSD (e.g., IoU improvements of 0.026 and 0.018 for GEM-Tiny and GEM-Base). The study demonstrates the potential of combining visual foundation models with synthetic data for specialized perception tasks, while also revealing data-scale saturation effects and signaling directions for future AIGC-assisted segmentation research.
Abstract
Detecting glass regions is a challenging task due to the ambiguity of their transparency and reflection properties. These transparent glasses share the visual appearance of both transmitted arbitrary background scenes and reflected objects, thus having no fixed patterns.Recent visual foundation models, which are trained on vast amounts of data, have manifested stunning performance in terms of image perception and image generation. To segment glass surfaces with higher accuracy, we make full use of two visual foundation models: Segment Anything (SAM) and Stable Diffusion.Specifically, we devise a simple glass surface segmentor named GEM, which only consists of a SAM backbone, a simple feature pyramid, a discerning query selection module, and a mask decoder. The discerning query selection can adaptively identify glass surface features, assigning them as initialized queries in the mask decoder. We also propose a Synthetic but photorealistic large-scale Glass Surface Detection dataset dubbed S-GSD via diffusion model with four different scales, which contain 1x, 5x, 10x, and 20x of the original real data size. This dataset is a feasible source for transfer learning. The scale of synthetic data has positive impacts on transfer learning, while the improvement will gradually saturate as the amount of data increases. Extensive experiments demonstrate that GEM achieves a new state-of-the-art on the GSD-S validation set (IoU +2.1%). Codes and datasets are available at: https://github.com/isbrycee/GEM-Glass-Segmentor.
