LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration
Jue Gong, Zihan Zhou, Jingkai Wang, Shu Li, Libo Liu, Jianliang Lan, Yulun Zhang
TL;DR
LCUDiff tackles fidelity gaps in human body restoration under degradation by upgrading the latent space of a pretrained latent diffusion model from $4$ to $16$ channels using Channel Splitting Distillation to keep anchor channels aligned while learning high-frequency details. It introduces Prior-Preserving Adaptation to smoothly bridge the mismatch between the frozen $4$-channel UNet and the expanded latent, and a Decoder Router that selects between decoders on a per-sample basis, yielding better pixel-level and perceptual fidelity without extra inference cost. The approach is validated on synthetic and real-world datasets, showing superior $ ext{DISTS}$ and $ ext{PSNR}/ ext{PSNRY}$ scores and improved no-reference metrics, while preserving one-step efficiency. The work provides open-source code and demonstrates practical improvements for robust HBR in real-world scenarios.
Abstract
Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity, particularly in human body restoration (HBR). Recent diffusion-based restoration methods commonly adapt pre-trained text-to-image diffusion models, where the variational autoencoder (VAE) can significantly bottleneck restoration fidelity. We propose LCUDiff, a stable one-step framework that upgrades a pre-trained latent diffusion model from the 4-channel latent space to the 16-channel latent space. For VAE fine-tuning, channel splitting distillation (CSD) is used to keep the first four channels aligned with pre-trained priors while allocating the additional channels to effectively encode high-frequency details. We further design prior-preserving adaptation (PPA) to smoothly bridge the mismatch between 4-channel diffusion backbones and the higher-dimensional 16-channel latent. In addition, we propose a decoder router (DeR) for per-sample decoder routing using restoration-quality score annotations, which improves visual quality across diverse conditions. Experiments on synthetic and real-world datasets show competitive results with higher fidelity and fewer artifacts under mild degradations, while preserving one-step efficiency. The code and model will be at https://github.com/gobunu/LCUDiff.
