Video Generation with Stable Transparency via Shiftable RGB-A Distribution Learner
Authors
Haotian Dong, Wenjing Wang, Chen Li, Jing Lyu, Di Lin
Abstract
Generating RGB-A videos, which include alpha channels for transparency, has wide applications. However, current methods often suffer from low quality due to confusion between RGB and alpha. In this paper, we address this problem by learning shiftable RGB-A distributions. We adjust both the latent space and noise space, shifting the alpha distribution outward while preserving the RGB distribution, thereby enabling stable transparency generation without compromising RGB quality. Specifically, for the latent space, we propose a transparency-aware bidirectional diffusion loss during VAE training, which shifts the RGB-A distribution according to likelihood. For the noise space, we propose shifting the mean of diffusion noise sampling and applying a Gaussian ellipse mask to provide transparency guidance and controllability. Additionally, we construct a high-quality RGB-A video dataset. Compared to state-of-the-art methods, our model excels in visual quality, naturalness, transparency rendering, inference convenience, and controllability. The released model is available on our website: https://donghaotian123.github.io/Wan-Alpha/.