Table of Contents
Fetching ...

Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

Xuanrui Zeng

TL;DR

The paper addresses weakly supervised semantic segmentation from image tags by reframing the problem as reconstruction from a structured decomposition of the input image into mask-lets and image-lets. It introduces two neural networks, a mask network $f_m$ and a decomposition network $f_x$, to produce $M$ and $X$, enabling a reconstructed image $\hat{I}$ via $\hat{I}_{c,h,w} = \sum_k M_{k,h,w} \cdot X_{k,c,h,w}$; the learning objective combines a reconstruction loss $L_{recon}$, a mask regulation $L_{mask}$, and a class-guided loss $L_{cls}$ that leverages a pretrained classifier $g$. The overall loss $L = L_{recon} + \lambda_m L_{mask} + \lambda_c L_{cls}$ guides joint training of $f_m$ and $f_x$, with $g$ fixed, to encourage accurate segmentation while mitigating background ambiguity. Experiments on a toy binary dog segmentation task show the method yields crisp object masks and robustness to background bias, highlighting the potential of structured decomposition to improve weak supervision and suggesting avenues for extending to multi-class segmentation. The approach offers a principled reconstruction-based regularization framework that can be integrated with standard CNN backbones to reduce labeling costs in practical segmentation tasks.

Abstract

Weakly supervised image segmentation (WSSS) from image tags remains challenging due to its under-constraint nature. Most mainstream work focus on the extraction of class activation map (CAM) and imposing various additional regularization. Contrary to the mainstream, we propose to frame WSSS as a problem of reconstruction from decomposition of the image using its mask, under which most regularization are embedded implicitly within the framework of the new problem. Our approach has demonstrated promising results on initial experiments, and shown robustness against the problem of background ambiguity. Our code is available at \url{https://github.com/xuanrui-work/WSSSByRec}.

Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition

TL;DR

The paper addresses weakly supervised semantic segmentation from image tags by reframing the problem as reconstruction from a structured decomposition of the input image into mask-lets and image-lets. It introduces two neural networks, a mask network and a decomposition network , to produce and , enabling a reconstructed image via ; the learning objective combines a reconstruction loss , a mask regulation , and a class-guided loss that leverages a pretrained classifier . The overall loss guides joint training of and , with fixed, to encourage accurate segmentation while mitigating background ambiguity. Experiments on a toy binary dog segmentation task show the method yields crisp object masks and robustness to background bias, highlighting the potential of structured decomposition to improve weak supervision and suggesting avenues for extending to multi-class segmentation. The approach offers a principled reconstruction-based regularization framework that can be integrated with standard CNN backbones to reduce labeling costs in practical segmentation tasks.

Abstract

Weakly supervised image segmentation (WSSS) from image tags remains challenging due to its under-constraint nature. Most mainstream work focus on the extraction of class activation map (CAM) and imposing various additional regularization. Contrary to the mainstream, we propose to frame WSSS as a problem of reconstruction from decomposition of the image using its mask, under which most regularization are embedded implicitly within the framework of the new problem. Our approach has demonstrated promising results on initial experiments, and shown robustness against the problem of background ambiguity. Our code is available at \url{https://github.com/xuanrui-work/WSSSByRec}.
Paper Structure (16 sections, 8 equations, 3 figures, 2 tables)

This paper contains 16 sections, 8 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Overview of our pipeline for weak segmentation.
  • Figure 2: Qualitative results on 50 randomly selected training samples for "dog-present" images (first 50) and "dog-absent" (last 50) images respectively. Left, each square: input image. Right, each square: output mask overlayed on the input image.
  • Figure 3: Qualitative results on 50 randomly selected validation samples for "dog-present" images (first 50) and "dog-absent" (last 50) images respectively. Left, each square: input image. Right, each square: output mask overlayed on the input image.