Competing for pixels: a self-play algorithm for weakly-supervised segmentation

Shaheer U. Saeed; Shiqi Huang; João Ramalhinho; Iani J. M. B. Gayo; Nina Montaña-Brown; Ester Bonmati; Stephen P. Pereira; Brian Davidson; Dean C. Barratt; Matthew J. Clarkson; Yipeng Hu

Competing for pixels: a self-play algorithm for weakly-supervised segmentation

Shaheer U. Saeed, Shiqi Huang, João Ramalhinho, Iani J. M. B. Gayo, Nina Montaña-Brown, Ester Bonmati, Stephen P. Pereira, Brian Davidson, Dean C. Barratt, Matthew J. Clarkson, Yipeng Hu

TL;DR

This work tackles weakly supervised segmentation by turning ROI localization into a competitive, two-agent RL self-play game where patch-level scores are derived from an object presence detector trained solely on image-level labels. A termination rule and carefully crafted rewards encourage precise ROI exhaustion, addressing common WSS issues of over- and under-segmentation. Across VOC, COCO, and two medical datasets, the proposed RLSP method outperforms state-of-the-art image-level supervised approaches, with ablations highlighting the importance of self-play and task-based rewards. The framework offers a scalable, time-efficient approach to segmentation under weak supervision and suggests avenues for future extensions in multi-task and meta-learning settings.

Abstract

Weakly-supervised segmentation (WSS) methods, reliant on image-level labels indicating object presence, lack explicit correspondence between labels and regions of interest (ROIs), posing a significant challenge. Despite this, WSS methods have attracted attention due to their much lower annotation costs compared to fully-supervised segmentation. Leveraging reinforcement learning (RL) self-play, we propose a novel WSS method that gamifies image segmentation of a ROI. We formulate segmentation as a competition between two agents that compete to select ROI-containing patches until exhaustion of all such patches. The score at each time-step, used to compute the reward for agent training, represents likelihood of object presence within the selection, determined by an object presence detector pre-trained using only image-level binary classification labels of object presence. Additionally, we propose a game termination condition that can be called by either side upon exhaustion of all ROI-containing patches, followed by the selection of a final patch from each. Upon termination, the agent is incentivised if ROI-containing patches are exhausted or disincentivised if an ROI-containing patch is found by the competitor. This competitive setup ensures minimisation of over- or under-segmentation, a common problem with WSS methods. Extensive experimentation across four datasets demonstrates significant performance improvements over recent state-of-the-art methods. Code: https://github.com/s-sd/spurl/tree/main/wss

Competing for pixels: a self-play algorithm for weakly-supervised segmentation

TL;DR

Abstract

Paper Structure (16 sections, 5 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 16 sections, 5 equations, 4 figures, 6 tables, 1 algorithm.

Introduction
Summary of contributions
Related work
Reinforcement learning self-play
Weakly supervised segmentation
Methods
Competing functions and the environment
Gamified segmentation using self-play
Experiments and results
Datasets for evaluation
Network architectures and hyper-parameters
Comparisons with the state-of-the-art
Ablation studies
Comparisons to baselines
Quantifying over- and under-segmentation
...and 1 more sections

Figures (4)

Figure 1: Competitive self-play for segmentation using only weak signals generated by an object presence detector (itself trained using only image-level classification labels). At inference the trained agent plays against a dummy opponent to segment the image.
Figure 2: Overview of the gamified WSS segmentation using RL self-play.
Figure 3: Left: VOC; Right: CMPS. 'Heatmap': object presence detector score heatmap. Bottom: VOC; time-points at inference. RLSP uses pixel-level majority vote by running inference thrice.
Figure 4: VOC test set segmentation results (left to right: image; ground truth; proposed).

Competing for pixels: a self-play algorithm for weakly-supervised segmentation

TL;DR

Abstract

Competing for pixels: a self-play algorithm for weakly-supervised segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)