Table of Contents
Fetching ...

Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

Yidi Liu, Xueyang Fu, Jie Huang, Jie Xiao, Dong Li, Wenlong Zhang, Lei Bai, Zheng-Jun Zha

TL;DR

Latent Harmony tackles UHD image restoration by decoupling latent-space generalization from high-frequency reconstruction. It first builds a robust LH-VAE latent space using progressive degradation perturbations, degradation-invariant semantic constraints, and latent equivariance, then refines restoration with high-frequency guided LoRA modules—FHF-LoRA for fidelity and PHF-LoRA for perception—driven by dedicated high-frequency losses. An inference-time parameter $\alpha$ enables explicit control over the fidelity-perception trade-off. Across UHD and standard-resolution benchmarks, Latent Harmony achieves state-of-the-art performance with improved efficiency and generalization, while preserving latent structure and enabling flexible output customization.

Abstract

Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.In Stage One, we introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradation perturbations, while latent equivariance strengthens high-frequency reconstruction.Stage Two jointly trains this refined VAE with a restoration model using High-Frequency Low-Rank Adaptation (HF-LoRA): an encoder LoRA guided by a fidelity-oriented high-frequency alignment loss to recover authentic details, and a decoder LoRA driven by a perception-oriented loss to synthesize realistic textures. Both LoRA modules are trained via alternating optimization with selective gradient propagation to preserve the pretrained latent structure.At inference, a tunable parameter α enables flexible fidelity-perception trade-offs.Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.

Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

TL;DR

Latent Harmony tackles UHD image restoration by decoupling latent-space generalization from high-frequency reconstruction. It first builds a robust LH-VAE latent space using progressive degradation perturbations, degradation-invariant semantic constraints, and latent equivariance, then refines restoration with high-frequency guided LoRA modules—FHF-LoRA for fidelity and PHF-LoRA for perception—driven by dedicated high-frequency losses. An inference-time parameter enables explicit control over the fidelity-perception trade-off. Across UHD and standard-resolution benchmarks, Latent Harmony achieves state-of-the-art performance with improved efficiency and generalization, while preserving latent structure and enabling flexible output customization.

Abstract

Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.In Stage One, we introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradation perturbations, while latent equivariance strengthens high-frequency reconstruction.Stage Two jointly trains this refined VAE with a restoration model using High-Frequency Low-Rank Adaptation (HF-LoRA): an encoder LoRA guided by a fidelity-oriented high-frequency alignment loss to recover authentic details, and a decoder LoRA driven by a perception-oriented loss to synthesize realistic textures. Both LoRA modules are trained via alternating optimization with selective gradient propagation to preserve the pretrained latent structure.At inference, a tunable parameter α enables flexible fidelity-perception trade-offs.Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.

Paper Structure

This paper contains 17 sections, 9 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Comparison with existing mainstream methods.Our method outperforms existing standard and UHD all-in-one approaches by leveraging latent regularization, achieving superior efficiency and generalization without requiring degradation-aware branches, while enabling adjustable fidelity and perceptual quality during inference.
  • Figure 2: Motivation Analysis. (a) t-SNE visualization of VAE latents under diverse degradations, showing Baseline2's degradation-sensitive clustering versus our method's semantic clustering. (b) Cross-degradation cosine similarity (CDCS) analysis, with higher CDCS in high-frequency bands. (c) DCT spectral analysis, revealing Baseline1's low high-frequency components and Baseline2's elevated components, indicating a reconstruction-generalization trade-off via latent high-frequency proportions. (d) Fine-tuning loss comparison, highlighting stable downstream gains with high-frequency loss. (e) HF-LoRA experiments, demonstrating optimal fidelity and perceptual gains from encoder (fidelity loss) and decoder (perceptual loss) fine-tuning.(Note: All metrics in (e) are normalized to a positive scale, where higher values indicate better performance)
  • Figure 3: Framework Overview. Stage 1: LH-VAE training employs progressive degradation perturbation, degradation-invariant visual semantic loss $L_{INV}$, and latent space equivariance loss $L_{Eqv}$ to construct a robust, generalizable latent space. Stage 2: Latent space restoration leverages $R_\theta$ and high-frequency-guided LoRA fine-tuning, with Fidelity-oriented HF-LoRA (FHF-LoRA) for the encoder and Perception-oriented HF-LoRA (PHF-LoRA) for the decoder, enabling adjustable fidelity and perceptual quality via parameter $\alpha$ during inference. Results of $\alpha$ tuning are shown in the upper panel, with metrics normalized positively, where higher values indicate better performance.
  • Figure 4: Visual results for four types of degradation removal with other all-in-one methods.