In-Context Matting

He Guo; Zixuan Ye; Zhiguo Cao; Hao Lu

In-Context Matting

He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu

TL;DR

In-context matting introduces a new task where a single reference image with user priors guides automatic alpha estimation across a batch of target images sharing the same foreground. IconMatting leverages a Stable Diffusion–based feature extractor and a novel in-context similarity mechanism (inter- and intra-similarity) to match reference context to targets, followed by a matting head that fuses guidance with original image details. The approach, validated on ICM-57 and AIM-500, achieves competitive accuracy compared to trimap-based matting while maintaining automation, demonstrating the promise of context-driven matting. The work also provides a new dataset, training strategies, and extensions to video, highlighting the practicality and impact of combining context-guided matching with automatic matting.

Abstract

We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. This setting marries good performance in auxiliary input-based matting and ease of use in automatic matting, which finds a good trade-off between customization and automation. To overcome the key challenge of accurate foreground matching, we introduce IconMatting, an in-context matting model built upon a pre-trained text-to-image diffusion model. Conditioned on inter- and intra-similarity matching, IconMatting can make full use of reference context to generate accurate target alpha mattes. To benchmark the task, we also introduce a novel testing dataset ICM-$57$, covering 57 groups of real-world images. Quantitative and qualitative results on the ICM-57 testing set show that IconMatting rivals the accuracy of trimap-based matting while retaining the automation level akin to automatic matting. Code is available at https://github.com/tiny-smart/in-context-matting

In-Context Matting

TL;DR

Abstract

, covering 57 groups of real-world images. Quantitative and qualitative results on the ICM-57 testing set show that IconMatting rivals the accuracy of trimap-based matting while retaining the automation level akin to automatic matting. Code is available at https://github.com/tiny-smart/in-context-matting

Paper Structure (45 sections, 5 equations, 12 figures, 8 tables)

This paper contains 45 sections, 5 equations, 12 figures, 8 tables.

Introduction
Related Work
Image Matting.
In-Context Learning in Vision.
In-Context Matting with Diffusion Models
Problem Setup
Overall Architecture
In-Context Feature Extractor
Backbone Selection.
Preliminary on Stable Diffusion.
In-Context Similarity
Observation.
Inter-Similarity.
Intra-Similarity.
Matting Head
...and 30 more sections

Figures (12)

Figure 1: In-Context Matting. This novel task setting for image matting enables automatic natural image matting of target images of a certain object category conditioned on a reference image of the same category, with user-provided priors such as masks and scribbles on the reference image only. Notice that, our approach exhibits remarkable cross-domain matting quality.
Figure 2: IconMatting integrates a Stable Diffusion-derived feature extractor, an in-context similarity module, and a matting head. It processes a target image $I_t$, a reference image $I_r$, and an RoI map $M_{RoI}$. Both reference and target image features and target self-attention maps are extracted and used. In-context similarity uses the in-context query from the reference image to create a guidance map, which, combined with self-attention maps, assists in locating the target object. The matting head finally generates the target alpha matte.
Figure 3: Observations on the inter- and intra similarities.
Figure 4: ICM-$57$ examples. The dataset encompasses foreground subjects including human, animals, plants, and various common objects. It contains both instances from the same category and the same entity.
Figure 5: Qualitative results of different image matting methods. Our method can predict the alpha matte of the matting target specified by the reference input, offering notable prediction accuracy while avoiding interference from unrelated foreground elements.
...and 7 more figures

In-Context Matting

TL;DR

Abstract

In-Context Matting

Authors

TL;DR

Abstract

Table of Contents

Figures (12)