Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement
Yidi Liu, Xueyang Fu, Jie Huang, Jie Xiao, Dong Li, Wenlong Zhang, Lei Bai, Zheng-Jun Zha
TL;DR
Latent Harmony tackles UHD image restoration by decoupling latent-space generalization from high-frequency reconstruction. It first builds a robust LH-VAE latent space using progressive degradation perturbations, degradation-invariant semantic constraints, and latent equivariance, then refines restoration with high-frequency guided LoRA modules—FHF-LoRA for fidelity and PHF-LoRA for perception—driven by dedicated high-frequency losses. An inference-time parameter $\alpha$ enables explicit control over the fidelity-perception trade-off. Across UHD and standard-resolution benchmarks, Latent Harmony achieves state-of-the-art performance with improved efficiency and generalization, while preserving latent structure and enabling flexible output customization.
Abstract
Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.In Stage One, we introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradation perturbations, while latent equivariance strengthens high-frequency reconstruction.Stage Two jointly trains this refined VAE with a restoration model using High-Frequency Low-Rank Adaptation (HF-LoRA): an encoder LoRA guided by a fidelity-oriented high-frequency alignment loss to recover authentic details, and a decoder LoRA driven by a perception-oriented loss to synthesize realistic textures. Both LoRA modules are trained via alternating optimization with selective gradient propagation to preserve the pretrained latent structure.At inference, a tunable parameter α enables flexible fidelity-perception trade-offs.Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.
