Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

Jonas Ernst; Wolfgang Boettcher; Lukas Hoyer; Jan Eric Lenssen; Bernt Schiele

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

Jonas Ernst, Wolfgang Boettcher, Lukas Hoyer, Jan Eric Lenssen, Bernt Schiele

TL;DR

Rewis3d is presented, a framework that leverages recent advances in feed-forward 3D reconstruction to significantly improve weakly supervised semantic segmentation on 2D images and enforces semantic consistency between 2D images and reconstructed 3D point clouds, using state-of-the-art feed-forward reconstruction to generate reliable geometric supervision.

Abstract

We present Rewis3d, a framework that leverages recent advances in feed-forward 3D reconstruction to significantly improve weakly supervised semantic segmentation on 2D images. Obtaining dense, pixel-level annotations remains a costly bottleneck for training segmentation models. Alleviating this issue, sparse annotations offer an efficient weakly-supervised alternative. However, they still incur a performance gap. To address this, we introduce a novel approach that leverages 3D scene reconstruction as an auxiliary supervisory signal. Our key insight is that 3D geometric structure recovered from 2D videos provides strong cues that can propagate sparse annotations across entire scenes. Specifically, a dual student-teacher architecture enforces semantic consistency between 2D images and reconstructed 3D point clouds, using state-of-the-art feed-forward reconstruction to generate reliable geometric supervision. Extensive experiments demonstrate that Rewis3d achieves state-of-the-art performance in sparse supervision, outperforming existing approaches by 2-7% without requiring additional labels or inference overhead.

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

TL;DR

Abstract

Paper Structure (25 sections, 7 equations, 11 figures, 13 tables)

This paper contains 25 sections, 7 equations, 11 figures, 13 tables.

Introduction
Related Work
Weakly-Supervised Segmentation.
Rewis3d
Framework Overview
3D Scene Reconstruction and Preprocessing
Dual Student-Teacher Architecture
Weighted Cross-Modal Consistency
Training Objective
Experiments
Experimental Setup
Baselines
Main Results
Generalization to Diverse Annotation Types
Qualitative Results
...and 10 more sections

Figures (11)

Figure 1: Rewis3d -- Left: Our method (Revis3d) greatly improves performance for weakly supervised segmentation, trained with point and scribble labels. Notably, we improve robustness to scale changes in objects and more precise class boundaries. Right: We consistently outperform previous SOTA methods on a range of datasets and a variety of sparse annotations by significant margins.
Figure 2: Conceptual overview of weakly-supervised segmentation approaches. (a) Traditional methods rely solely on sparse 2D annotations, limiting supervision propagation. (b) Our proposed method Rewis3d introduces a 3D branch, enforcing cross-modal consistency (CMC) between 2D predictions and 3D predictions from reconstructed geometry.
Figure 3: Overview of the training pipeline. Our framework operates in two stages. Base Training (blue and green) establishes independent student-teacher setups for each modality using sparse supervision. Cross-Modal Consistency (orange) introduces our core contribution: bidirectional knowledge transfer where the teacher of one modality supervises the student of the other, weighted by our dual confidence mechanism leveraging prediction certainty and reconstruction quality.
Figure 4: Sparse label accumulation. Firstly, an image sequence is unprojected to a 3D point cloud via a multi-view reconstruction model. Subsequently, we establish correspondences between the 3D points and the 2D pixels in the source images. This allows for label accumulation in the 3D space, and by projection, also in the 2D images.
Figure 5: Qualitative comparison across outdoor and indoor datasets. Rewis3d produces sharper boundaries, more accurate fine-grained predictions, and better long-range segmentation compared to the Mean Teacher baseline (EMA) and TEL, even in regions where 3D reconstruction is uncertain. Colormaps are provided in the appendix.
...and 6 more figures

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

TL;DR

Abstract

Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)