Table of Contents
Fetching ...

Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data

Lintao Xu, Chaohui Wang

TL;DR

Occlusion boundaries are crucial for scene understanding but high-quality Ground Truths (GTs) are scarce and often subjective. The paper presents MS3PE, a multi-scribble guided Interactive Occlusion Boundary Estimation framework that combines MSIM (multi-scribble interaction) with a three-encoding-path network (TPE-Net) and a multi-scale strip convolutional module (FEM) to refine OB predictions; training leverages synthetic data from Mesh2OB to generate OB-FUTURE, a large OB synthetic benchmark, while OB-LIGM provides a real-world benchmark. Key contributions include Mesh2OB for automatic OB ground truth from 3D scenes, OB-FUTURE with 19,186 synthetic indoor scenes, and OB-LIGM for high-quality real-world evaluation, all enabling effective OB benchmark construction and model training without domain adaptation. Experiments show MS3PE surpasses adapted interactive segmentation baselines and fully automatic methods, with significant reductions in annotation time when using machine-simulated or human scribbles, demonstrating the practical viability of interactive OB labeling and scalable dataset creation.

Abstract

Occlusion boundaries (OBs) geometrically localize occlusion events in 2D images and provide critical cues for scene understanding. In this paper, we present the first systematic study of Interactive Occlusion Boundary Estimation (IOBE), introducing MS\textsuperscript{3}PE, a novel multi-scribble-guided deep-learning framework that advances IOBE through two key innovations: (1) an intuitive multi-scribble interaction mechanism, and (2) a 3-encoding-path network enhanced with multi-scale strip convolutions. Our MS\textsuperscript{3}PE surpasses adapted baselines from seven state-of-the-art interactive segmentation methods, and demonstrates strong potential for OB benchmark construction through our real-user experiment. Besides, to address the scarcity of well-annotated real-world data, we propose using synthetic data for training IOBE models, and developed Mesh2OB, the first automated tool for generating precise ground-truth OBs from 3D scenes with self-occlusions explicitly handled, enabling creation of the OB-FUTURE synthetic benchmark that facilitates generalizable training without domain adaptation. Finally, we introduce OB-LIGM, a high-quality real-world benchmark comprising 120 meticulously annotated high-resolution images advancing evaluation standards in OB research. Source code and resources are available at https://github.com/xul-ops/IOBE.

Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data

TL;DR

Occlusion boundaries are crucial for scene understanding but high-quality Ground Truths (GTs) are scarce and often subjective. The paper presents MS3PE, a multi-scribble guided Interactive Occlusion Boundary Estimation framework that combines MSIM (multi-scribble interaction) with a three-encoding-path network (TPE-Net) and a multi-scale strip convolutional module (FEM) to refine OB predictions; training leverages synthetic data from Mesh2OB to generate OB-FUTURE, a large OB synthetic benchmark, while OB-LIGM provides a real-world benchmark. Key contributions include Mesh2OB for automatic OB ground truth from 3D scenes, OB-FUTURE with 19,186 synthetic indoor scenes, and OB-LIGM for high-quality real-world evaluation, all enabling effective OB benchmark construction and model training without domain adaptation. Experiments show MS3PE surpasses adapted interactive segmentation baselines and fully automatic methods, with significant reductions in annotation time when using machine-simulated or human scribbles, demonstrating the practical viability of interactive OB labeling and scalable dataset creation.

Abstract

Occlusion boundaries (OBs) geometrically localize occlusion events in 2D images and provide critical cues for scene understanding. In this paper, we present the first systematic study of Interactive Occlusion Boundary Estimation (IOBE), introducing MS\textsuperscript{3}PE, a novel multi-scribble-guided deep-learning framework that advances IOBE through two key innovations: (1) an intuitive multi-scribble interaction mechanism, and (2) a 3-encoding-path network enhanced with multi-scale strip convolutions. Our MS\textsuperscript{3}PE surpasses adapted baselines from seven state-of-the-art interactive segmentation methods, and demonstrates strong potential for OB benchmark construction through our real-user experiment. Besides, to address the scarcity of well-annotated real-world data, we propose using synthetic data for training IOBE models, and developed Mesh2OB, the first automated tool for generating precise ground-truth OBs from 3D scenes with self-occlusions explicitly handled, enabling creation of the OB-FUTURE synthetic benchmark that facilitates generalizable training without domain adaptation. Finally, we introduce OB-LIGM, a high-quality real-world benchmark comprising 120 meticulously annotated high-resolution images advancing evaluation standards in OB research. Source code and resources are available at https://github.com/xul-ops/IOBE.
Paper Structure (10 sections, 6 figures, 6 tables)

This paper contains 10 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overall workflow. During testing, given an image from the real-world benchmark OB-LIGM, MS3PE (i) predicts initial occlusion boundaries (OBs); (ii) receives all false negative (FN) & false positive (FP) scribbles from a human annotator, and (iii) outputs the refined result. For training, we generate synthetic data by applying Mesh2OB to the 3D-FUTURE dataset fu20213d, creating the synthetic benchmark OB-FUTURE.
  • Figure 2: Benchmark visualization. Red curves indicate OB ground truths. Blue boxes highlight representative regions containing missing, incomplete, and/or erroneous OBs.
  • Figure 3: MS3PE framework: integration of the deep network TPE-Net into the interaction mechanism MSIM, which incorporates the boundary scribble approach, one-shot interaction strategy, two-stage training scheme, scribble simulator, etc. MS3PE is modular in terms of both the deep network and the interaction mechanism.
  • Figure 4: Comparison of OB generation quality: Mesh2OB vs. qiu2020pixel. Mesh2OB's result: the union of those OBs in white & red; qiu2020pixel's result: only those OBs in white.
  • Figure 5: Qualitative results with quantitative metrics displayed in top-right corners (columns 2&4)
  • ...and 1 more figures