Glass Surface Detection: Leveraging Reflection Dynamics in Flash/No-flash Imagery
Tao Yan, Hao Huang, Yiwei Lu, Zeyu Wang, Ke Xu, Yinghui Wang, Xiaojun Chang, Rynson W. H. Lau
TL;DR
Glass surfaces are difficult to detect due to transparency and weak features. The authors introduce NFGlassNet, a no-flash and flash image pair driven detector that uses Reflection Contrast Mining (RCMM) and Reflection Guided Attention (RGAM) to extract and fuse reflection cues with glass features, enabling accurate localization. A new NFGD dataset (~3.3k pairs) supports learning of reflection appearance/disappearance under varied illumination. Experiments show NFGlassNet surpasses state-of-the-art methods on multiple benchmarks, validating the effectiveness of reflection-based cues for glass surface detection.
Abstract
Glass surfaces are ubiquitous in daily life, typically appearing colorless, transparent, and lacking distinctive features. These characteristics make glass surface detection a challenging computer vision task. Existing glass surface detection methods always rely on boundary cues (e.g., window and door frames) or reflection cues to locate glass surfaces, but they fail to fully exploit the intrinsic properties of the glass itself for accurate localization. We observed that in most real-world scenes, the illumination intensity in front of the glass surface differs from that behind it, which results in variations in the reflections visible on the glass surface. Specifically, when standing on the brighter side of the glass and applying a flash towards the darker side, existing reflections on the glass surface tend to disappear. Conversely, while standing on the darker side and applying a flash towards the brighter side, distinct reflections will appear on the glass surface. Based on this phenomenon, we propose NFGlassNet, a novel method for glass surface detection that leverages the reflection dynamics present in flash/no-flash imagery. Specifically, we propose a Reflection Contrast Mining Module (RCMM) for extracting reflections, and a Reflection Guided Attention Module (RGAM) for fusing features from reflection and glass surface for accurate glass surface detection. For learning our network, we also construct a dataset consisting of 3.3K no-flash and flash image pairs captured from various scenes with corresponding ground truth annotations. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods. Our code, model, and dataset will be available upon acceptance of the manuscript.
