Table of Contents
Fetching ...

Towards Natural Image Matting in the Wild via Real-Scenario Prior

Ruihao Xia, Yu Liang, Peng-Tao Jiang, Hao Zhang, Qianru Sun, Yang Tang, Bo Li, Pan Zhou

TL;DR

This work proposes SEMat, a new matting dataset based on the COCO dataset, namely COCO-Matting, which revamps the network architecture and training objectives and proves its efficacy in interactive natural image matting.

Abstract

Recent approaches attempt to adapt powerful interactive segmentation models, such as SAM, to interactive matting and fine-tune the models based on synthetic matting datasets. However, models trained on synthetic data fail to generalize to complex and occlusion scenes. We address this challenge by proposing a new matting dataset based on the COCO dataset, namely COCO-Matting. Specifically, the construction of our COCO-Matting includes accessory fusion and mask-to-matte, which selects real-world complex images from COCO and converts semantic segmentation masks to matting labels. The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios. Furthermore, existing SAM-based matting methods extract intermediate features and masks from a frozen SAM and only train a lightweight matting decoder by end-to-end matting losses, which do not fully exploit the potential of the pre-trained SAM. Thus, we propose SEMat which revamps the network architecture and training objectives. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes. For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information. Extensive experiments across seven diverse datasets demonstrate the superior performance of our method, proving its efficacy in interactive natural image matting. We open-source our code, models, and dataset at https://github.com/XiaRho/SEMat.

Towards Natural Image Matting in the Wild via Real-Scenario Prior

TL;DR

This work proposes SEMat, a new matting dataset based on the COCO dataset, namely COCO-Matting, which revamps the network architecture and training objectives and proves its efficacy in interactive natural image matting.

Abstract

Recent approaches attempt to adapt powerful interactive segmentation models, such as SAM, to interactive matting and fine-tune the models based on synthetic matting datasets. However, models trained on synthetic data fail to generalize to complex and occlusion scenes. We address this challenge by proposing a new matting dataset based on the COCO dataset, namely COCO-Matting. Specifically, the construction of our COCO-Matting includes accessory fusion and mask-to-matte, which selects real-world complex images from COCO and converts semantic segmentation masks to matting labels. The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios. Furthermore, existing SAM-based matting methods extract intermediate features and masks from a frozen SAM and only train a lightweight matting decoder by end-to-end matting losses, which do not fully exploit the potential of the pre-trained SAM. Thus, we propose SEMat which revamps the network architecture and training objectives. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes. For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information. Extensive experiments across seven diverse datasets demonstrate the superior performance of our method, proving its efficacy in interactive natural image matting. We open-source our code, models, and dataset at https://github.com/XiaRho/SEMat.

Paper Structure

This paper contains 17 sections, 6 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Improvements of our proposed COCO-Matting dataset and SEMat framework.
  • Figure 2: The construction of our proposed COCO-Matting dataset is divided into two parts: Accessory Fusion and Mask-to-Matte. Finally, a comparison of original masks and alpha mattes is shown.
  • Figure 3: Given an image and a box prompt, the alpha matte is obtained by the process of our proposed Feature-Aligned Transformer and Matte-Aligned Decoder sequentially. Furthermore, combined with the traditional matting loss, the frozen SAM and trimap annotations are introduced to calculate the regularization and trimap loss during training.
  • Figure 4: Qualitative matting results of MAM MAM, SmartMat SmartMat, and our SEMat (SAM) on different datasets. See more matting results in Appendix \ref{['appendix_vis']}.
  • Figure 5: Qualitative matting results on the HIM-2K dataset HIM2K with InstMatt InstMatt, SEMat (HQ-SAM) and SEMat (SAM2).
  • ...and 6 more figures