Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data
Lintao Xu, Chaohui Wang
TL;DR
Occlusion boundaries are crucial for scene understanding but high-quality Ground Truths (GTs) are scarce and often subjective. The paper presents MS3PE, a multi-scribble guided Interactive Occlusion Boundary Estimation framework that combines MSIM (multi-scribble interaction) with a three-encoding-path network (TPE-Net) and a multi-scale strip convolutional module (FEM) to refine OB predictions; training leverages synthetic data from Mesh2OB to generate OB-FUTURE, a large OB synthetic benchmark, while OB-LIGM provides a real-world benchmark. Key contributions include Mesh2OB for automatic OB ground truth from 3D scenes, OB-FUTURE with 19,186 synthetic indoor scenes, and OB-LIGM for high-quality real-world evaluation, all enabling effective OB benchmark construction and model training without domain adaptation. Experiments show MS3PE surpasses adapted interactive segmentation baselines and fully automatic methods, with significant reductions in annotation time when using machine-simulated or human scribbles, demonstrating the practical viability of interactive OB labeling and scalable dataset creation.
Abstract
Occlusion boundaries (OBs) geometrically localize occlusion events in 2D images and provide critical cues for scene understanding. In this paper, we present the first systematic study of Interactive Occlusion Boundary Estimation (IOBE), introducing MS\textsuperscript{3}PE, a novel multi-scribble-guided deep-learning framework that advances IOBE through two key innovations: (1) an intuitive multi-scribble interaction mechanism, and (2) a 3-encoding-path network enhanced with multi-scale strip convolutions. Our MS\textsuperscript{3}PE surpasses adapted baselines from seven state-of-the-art interactive segmentation methods, and demonstrates strong potential for OB benchmark construction through our real-user experiment. Besides, to address the scarcity of well-annotated real-world data, we propose using synthetic data for training IOBE models, and developed Mesh2OB, the first automated tool for generating precise ground-truth OBs from 3D scenes with self-occlusions explicitly handled, enabling creation of the OB-FUTURE synthetic benchmark that facilitates generalizable training without domain adaptation. Finally, we introduce OB-LIGM, a high-quality real-world benchmark comprising 120 meticulously annotated high-resolution images advancing evaluation standards in OB research. Source code and resources are available at https://github.com/xul-ops/IOBE.
