Table of Contents
Fetching ...

LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset

Wenqi Guo, Yiyang Du, Shan Du

TL;DR

This work tackles the scarce availability of annotated gas-leak datasets by introducing SimGas, a synthetic, video-based dataset with accurate segmentation ground truth for semi-transparent leaks. It presents LangGas, a selective zero-shot pipeline that fuses background subtraction, vision-language model filtering with careful prompts, temporal consistency checks, and SAM-based segmentation to localize leaks without labeled training data. The method achieves an IoU of 0.69 on its dataset, and demonstrates promising qualitative transfer to GasVid, indicating potential for practical deployment in industrial monitoring. The approach is lightweight enough for near-real-time operation and highlights future extensions such as moving-camera handling via optical flow and broader applicability to other detection tasks.

Abstract

Gas leakage poses a significant hazard that requires prevention. Traditionally, human inspection has been used for detection, a slow and labour-intensive process. Recent research has applied machine learning techniques to this problem, yet there remains a shortage of high-quality, publicly available datasets. This paper introduces a synthetic dataset, SimGas, featuring diverse backgrounds, interfering foreground objects, diverse leak locations, and precise segmentation ground truth. We propose a zero-shot method that combines background subtraction, zero-shot object detection, filtering, and segmentation to leverage this dataset. Experimental results indicate that our approach significantly outperforms baseline methods based solely on background subtraction and zero-shot object detection with segmentation, reaching an IoU of 69%. We also present an analysis of various prompt configurations and threshold settings to provide deeper insights into the performance of our method. Finally, we qualitatively (because of the lack of ground truth) tested our performance on GasVid and reached decent results on the real-world dataset. The dataset, code, and full qualitative results are available at https://github.com/weathon/Lang-Gas.

LangGas: Introducing Language in Selective Zero-Shot Background Subtraction for Semi-Transparent Gas Leak Detection with a New Dataset

TL;DR

This work tackles the scarce availability of annotated gas-leak datasets by introducing SimGas, a synthetic, video-based dataset with accurate segmentation ground truth for semi-transparent leaks. It presents LangGas, a selective zero-shot pipeline that fuses background subtraction, vision-language model filtering with careful prompts, temporal consistency checks, and SAM-based segmentation to localize leaks without labeled training data. The method achieves an IoU of 0.69 on its dataset, and demonstrates promising qualitative transfer to GasVid, indicating potential for practical deployment in industrial monitoring. The approach is lightweight enough for near-real-time operation and highlights future extensions such as moving-camera handling via optical flow and broader applicability to other detection tasks.

Abstract

Gas leakage poses a significant hazard that requires prevention. Traditionally, human inspection has been used for detection, a slow and labour-intensive process. Recent research has applied machine learning techniques to this problem, yet there remains a shortage of high-quality, publicly available datasets. This paper introduces a synthetic dataset, SimGas, featuring diverse backgrounds, interfering foreground objects, diverse leak locations, and precise segmentation ground truth. We propose a zero-shot method that combines background subtraction, zero-shot object detection, filtering, and segmentation to leverage this dataset. Experimental results indicate that our approach significantly outperforms baseline methods based solely on background subtraction and zero-shot object detection with segmentation, reaching an IoU of 69%. We also present an analysis of various prompt configurations and threshold settings to provide deeper insights into the performance of our method. Finally, we qualitatively (because of the lack of ground truth) tested our performance on GasVid and reached decent results on the real-world dataset. The dataset, code, and full qualitative results are available at https://github.com/weathon/Lang-Gas.

Paper Structure

This paper contains 22 sections, 2 equations, 7 figures, 4 tables, 3 algorithms.

Figures (7)

  • Figure 1: Method overview: Our method for gas leak detection involves background subtraction, zero-shot object detection, non-maximum suppression (NMS), temporal filtering, and segmentation. First, background subtraction is used to identify the moving parts in the video. Then, two text prompts (positive and negative prompts) are employed to guide a zero-shot object detector in detecting leaks. We use the prompt "white steam" because it is more commonly recognized than phrases explicitly mentioning gas leaks. NMS and temporal filtering are then applied to remove extra boxes and fix false positives or negatives based on past temporal information. Finally, a segmentation model—such as the Segment Anything Model 2 (SAM 2)—is used to convert the bounding boxes into segmentation masks.
  • Figure 2: Preview of our dataset. These images are selected from 10 different videos. For each side-by-side subplot, the left one is the input frame, and the right one is the un-thresholded ground truth. Some of these videos use GasVid gasvid as background, while others use DALL-E-2 dalle2 generated backgrounds.
  • Figure 3: BGS-only baseline on our dataset with different morphological closing operation sizes
  • Figure 4: Performance of VLM with different prompts and thresholds
  • Figure 5: Performance of Method with and without Temporal Filter across Different VLM Thresholds ($\tau_{VLM}$)
  • ...and 2 more figures