AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Tongfei Chen; Shuo Yang; Yuguang Yang; Linlin Yang; Runtang Guo; Changbai Li; He Long; Chunyu Xie; Dawei Leng; Baochang Zhang

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang

TL;DR

Alignment-Aware Masked Learning (AML) is introduced, a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues.

Abstract

Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

TL;DR

Abstract

Paper Structure (50 sections, 67 equations, 8 figures, 18 tables, 1 algorithm)

This paper contains 50 sections, 67 equations, 8 figures, 18 tables, 1 algorithm.

Introduction
Related Work
Method
Preliminary
PatchMax Matching Evaluation
Alignment-Aware Filtering Masking (AFM)
AML Training Framework
Experiment
Experimental Setup
Comparative Experiment on RIS datasets
Robustness under Cross-Dataset and Visual Perturbations
Ablation & Analysis
Conclusion & Limitation
Dataset
Details
...and 35 more sections

Figures (8)

Figure 1: Overall results of AMLRIS, including qualitative examples, benchmark comparisons (oIoU), and cross-dataset robustness evaluation, where the model is trained only on RefCOCO+ and evaluated on RefCOCOg and RefCOCO under seven perturbation scenarios.
Figure 2: Overview of Alignment-aware Masked Learning (AML) framework.
Figure 3: Architecture for PMME and AFM modules.
Figure 4: Qualitative and quantitative comparison with the Random Mask strategy.
Figure 5: More comparison of prediction maps: Baseline , Random Mask (RM), and AML.
...and 3 more figures

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

TL;DR

Abstract

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)