Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

Xuanrui Zeng

Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

Xuanrui Zeng

TL;DR

The paper addresses weakly supervised semantic segmentation from image tags by reframing the problem as reconstruction from a structured decomposition of the input image into mask-lets and image-lets. It introduces two neural networks, a mask network $f_m$ and a decomposition network $f_x$, to produce $M$ and $X$, enabling a reconstructed image $\hat{I}$ via $\hat{I}_{c,h,w} = \sum_k M_{k,h,w} \cdot X_{k,c,h,w}$; the learning objective combines a reconstruction loss $L_{recon}$, a mask regulation $L_{mask}$, and a class-guided loss $L_{cls}$ that leverages a pretrained classifier $g$. The overall loss $L = L_{recon} + \lambda_m L_{mask} + \lambda_c L_{cls}$ guides joint training of $f_m$ and $f_x$, with $g$ fixed, to encourage accurate segmentation while mitigating background ambiguity. Experiments on a toy binary dog segmentation task show the method yields crisp object masks and robustness to background bias, highlighting the potential of structured decomposition to improve weak supervision and suggesting avenues for extending to multi-class segmentation. The approach offers a principled reconstruction-based regularization framework that can be integrated with standard CNN backbones to reduce labeling costs in practical segmentation tasks.

Abstract

Weakly supervised image segmentation (WSSS) from image tags remains challenging due to its under-constraint nature. Most mainstream work focus on the extraction of class activation map (CAM) and imposing various additional regularization. Contrary to the mainstream, we propose to frame WSSS as a problem of reconstruction from decomposition of the image using its mask, under which most regularization are embedded implicitly within the framework of the new problem. Our approach has demonstrated promising results on initial experiments, and shown robustness against the problem of background ambiguity. Our code is available at \url{https://github.com/xuanrui-work/WSSSByRec}.

Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

TL;DR

and a decomposition network

, to produce

and

, enabling a reconstructed image

via

; the learning objective combines a reconstruction loss

, a mask regulation

, and a class-guided loss

that leverages a pretrained classifier

. The overall loss

guides joint training of

and

, with

fixed, to encourage accurate segmentation while mitigating background ambiguity. Experiments on a toy binary dog segmentation task show the method yields crisp object masks and robustness to background bias, highlighting the potential of structured decomposition to improve weak supervision and suggesting avenues for extending to multi-class segmentation. The approach offers a principled reconstruction-based regularization framework that can be integrated with standard CNN backbones to reduce labeling costs in practical segmentation tasks.

Abstract

Paper Structure (16 sections, 8 equations, 3 figures, 2 tables)

This paper contains 16 sections, 8 equations, 3 figures, 2 tables.

Introduction
Related works
Method
Loss Functions
$L_\text{recon}$
$L_\text{mask}$
$L_\text{cls}$
Overall Loss
Experiments
Dataset
Architecture
Training
Qualitative Results
Conclusion
Future work
...and 1 more sections

Figures (3)

Figure 1: Overview of our pipeline for weak segmentation.
Figure 2: Qualitative results on 50 randomly selected training samples for "dog-present" images (first 50) and "dog-absent" (last 50) images respectively. Left, each square: input image. Right, each square: output mask overlayed on the input image.
Figure 3: Qualitative results on 50 randomly selected validation samples for "dog-present" images (first 50) and "dog-absent" (last 50) images respectively. Left, each square: input image. Right, each square: output mask overlayed on the input image.

Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

TL;DR

Abstract

Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

Authors

TL;DR

Abstract

Table of Contents

Figures (3)