ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
Ting Zhang, Zhiqiang Yuan, Yeshuang Zhu, Jinchao Zhang
TL;DR
This work tackles generating animated stickers with high-quality transparent channels, a task where existing video matting struggles with semi-open regions and diffusion-based methods suffer from temporal flicker. It introduces ILDiff, which combines implicit layout distillation of SAM features with a temporal modeling branch to enforce layout-aware, temporally coherent alpha channels within a latent diffusion framework. A new Transparent Animated Sticker Dataset (TASD) with 0.32M samples and a 200-sample TASD-T test set is provided to support evaluation and future research. Empirical results show ILDiff delivers finer and smoother transparent channels than strong baselines such as Matting Anything and Layer Diffusion, and ablations highlight the importance of the temporal depth in the layout adapter. The work offers practical advances for animated sticker generation and resources for the community by releasing code and TASD.
Abstract
High-quality animated stickers usually contain transparent channels, which are often ignored by current video generation models. To generate fine-grained animated transparency channels, existing methods can be roughly divided into video matting algorithms and diffusion-based algorithms. The methods based on video matting have poor performance in dealing with semi-open areas in stickers, while diffusion-based methods are often used to model a single image, which will lead to local flicker when modeling animated stickers. In this paper, we firstly propose an ILDiff method to generate animated transparent channels through implicit layout distillation, which solves the problems of semi-open area collapse and no consideration of temporal information in existing methods. Secondly, we create the Transparent Animated Sticker Dataset (TASD), which contains 0.32M high-quality samples with transparent channel, to provide data support for related fields. Extensive experiments demonstrate that ILDiff can produce finer and smoother transparent channels compared to other methods such as Matting Anything and Layer Diffusion. Our code and dataset will be released at link https://xiaoyuan1996.github.io.
