Table of Contents
Fetching ...

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang

TL;DR

Alignment-Aware Masked Learning (AML) is introduced, a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues.

Abstract

Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios

AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

TL;DR

Alignment-Aware Masked Learning (AML) is introduced, a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues.

Abstract

Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios
Paper Structure (50 sections, 67 equations, 8 figures, 18 tables, 1 algorithm)

This paper contains 50 sections, 67 equations, 8 figures, 18 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overall results of AMLRIS, including qualitative examples, benchmark comparisons (oIoU), and cross-dataset robustness evaluation, where the model is trained only on RefCOCO+ and evaluated on RefCOCOg and RefCOCO under seven perturbation scenarios.
  • Figure 2: Overview of Alignment-aware Masked Learning (AML) framework.
  • Figure 3: Architecture for PMME and AFM modules.
  • Figure 4: Qualitative and quantitative comparison with the Random Mask strategy.
  • Figure 5: More comparison of prediction maps: Baseline , Random Mask (RM), and AML.
  • ...and 3 more figures