Enhancing Shape Perception and Segmentation Consistency for Industrial Image Inspection
Guoxuan Mao, Ting Cao, Ziyang Li, Yuan Dong
TL;DR
This work tackles segmentation consistency for fixed components in industrial image inspection under real-time constraints. It introduces SPENet, a shape-aware two-path network employing a Spatial Path for shape extraction, a decoupled body–edge module, and a Variable Boundary Domain to supervise boundary information. A novel Consistency Mean Square Error ($CMSE$) metric is proposed to quantify segmentation consistency for fixed parts. On the PVC dataset, SPENet achieves state-of-the-art mIoU and CMSE performance with real-time speed, and shows competitive results on CityScapes, highlighting practical impact for industrial QA and potential transfer to other industrial vision tasks.
Abstract
Semantic segmentation stands as a pivotal research focus in computer vision. In the context of industrial image inspection, conventional semantic segmentation models fail to maintain the segmentation consistency of fixed components across varying contextual environments due to a lack of perception of object contours. Given the real-time constraints and limited computing capability of industrial image detection machines, it is also necessary to create efficient models to reduce computational complexity. In this work, a Shape-Aware Efficient Network (SPENet) is proposed, which focuses on the shapes of objects to achieve excellent segmentation consistency by separately supervising the extraction of boundary and body information from images. In SPENet, a novel method is introduced for describing fuzzy boundaries to better adapt to real-world scenarios named Variable Boundary Domain (VBD). Additionally, a new metric, Consistency Mean Square Error(CMSE), is proposed to measure segmentation consistency for fixed components. Our approach attains the best segmentation accuracy and competitive speed on our dataset, showcasing significant advantages in CMSE among numerous state-of-the-art real-time segmentation networks, achieving a reduction of over 50% compared to the previously top-performing models.
