Semantic Segmentation from Image Labels by Reconstruction from Structured Decomposition
Xuanrui Zeng
TL;DR
The paper addresses weakly supervised semantic segmentation from image tags by reframing the problem as reconstruction from a structured decomposition of the input image into mask-lets and image-lets. It introduces two neural networks, a mask network $f_m$ and a decomposition network $f_x$, to produce $M$ and $X$, enabling a reconstructed image $\hat{I}$ via $\hat{I}_{c,h,w} = \sum_k M_{k,h,w} \cdot X_{k,c,h,w}$; the learning objective combines a reconstruction loss $L_{recon}$, a mask regulation $L_{mask}$, and a class-guided loss $L_{cls}$ that leverages a pretrained classifier $g$. The overall loss $L = L_{recon} + \lambda_m L_{mask} + \lambda_c L_{cls}$ guides joint training of $f_m$ and $f_x$, with $g$ fixed, to encourage accurate segmentation while mitigating background ambiguity. Experiments on a toy binary dog segmentation task show the method yields crisp object masks and robustness to background bias, highlighting the potential of structured decomposition to improve weak supervision and suggesting avenues for extending to multi-class segmentation. The approach offers a principled reconstruction-based regularization framework that can be integrated with standard CNN backbones to reduce labeling costs in practical segmentation tasks.
Abstract
Weakly supervised image segmentation (WSSS) from image tags remains challenging due to its under-constraint nature. Most mainstream work focus on the extraction of class activation map (CAM) and imposing various additional regularization. Contrary to the mainstream, we propose to frame WSSS as a problem of reconstruction from decomposition of the image using its mask, under which most regularization are embedded implicitly within the framework of the new problem. Our approach has demonstrated promising results on initial experiments, and shown robustness against the problem of background ambiguity. Our code is available at \url{https://github.com/xuanrui-work/WSSSByRec}.
