Matting by Generation
Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh
TL;DR
Image matting is modeled as a highly ill-posed problem C = αF + (1−α)B. This work reframes matting as conditional generation using a latent diffusion prior, enabling high-resolution, detail-rich mattes by leveraging a pre-trained diffusion model and a generative formulation. It supports both guidance-free and guidance-based matting, including text and spatial cues, through a patch-based high-resolution inference strategy guided by low-resolution mattes. Experiments on three real-world benchmarks show quantitative improvements and visually faithful boundaries, confirming the utility of latent-diffusion priors for matting. While diffusion-based inference is slower than regression, the approach offers a flexible, scalable, and effective matting paradigm with strong practical impact for editing and compositing.
Abstract
This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior resolution and detail. The proposed method is versatile and can perform both guidance-free and guidance-based image matting, accommodating a variety of additional cues. Our comprehensive evaluation across three benchmark datasets demonstrates the superior performance of our approach, both quantitatively and qualitatively. The results not only reflect our method's robust effectiveness but also highlight its ability to generate visually compelling mattes that approach photorealistic quality. The project page for this paper is available at https://lightchaserx.github.io/matting-by-generation/
