Table of Contents
Fetching ...

Efficient Image Restoration via Latent Consistency Flow Matching

Elad Cohen, Idan Achituve, Idit Diamant, Arnon Netzer, Hai Victor Habi

TL;DR

ELIR tackles efficient image restoration by operating in a compact latent space and guiding restoration with Latent Consistency Flow Matching (LCFM), which blends latent flow modeling with consistency constraints to minimize neural function evaluations. A lightweight convolutional architecture—enabled by a Tiny AutoEncoder, a small coarse estimator, and a U-Net-based velocity field—yields model sizes around tens of millions of parameters and高 FPS, suitable for edge devices. The method achieves competitive distortion and perceptual quality across blind face restoration, super-resolution, denoising, and inpainting while significantly reducing computational cost relative to diffusion and flow-based baselines. Theoretical bounds on Wasserstein-2 distance link encoder-decoder and vector-field errors to reconstruction quality, and extensive ablations demonstrate the efficiency and importance of the coarse estimator, encoder fine-tuning, and multi-segment latent training.

Abstract

Recent advances in generative image restoration (IR) have demonstrated impressive results. However, these methods are hindered by their substantial size and computational demands, rendering them unsuitable for deployment on edge devices. This work introduces ELIR, an Efficient Latent Image Restoration method. ELIR addresses the distortion-perception trade-off within the latent space and produces high-quality images using a latent consistency flow-based model. In addition, ELIR introduces an efficient and lightweight architecture. Consequently, ELIR is 4$\times$ smaller and faster than state-of-the-art diffusion and flow-based approaches for blind face restoration, enabling a deployment on resource-constrained devices. Comprehensive evaluations of various image restoration tasks and datasets show that ELIR achieves competitive performance compared to state-of-the-art methods, effectively balancing distortion and perceptual quality metrics while significantly reducing model size and computational cost. The code is available at: https://github.com/eladc-git/ELIR

Efficient Image Restoration via Latent Consistency Flow Matching

TL;DR

ELIR tackles efficient image restoration by operating in a compact latent space and guiding restoration with Latent Consistency Flow Matching (LCFM), which blends latent flow modeling with consistency constraints to minimize neural function evaluations. A lightweight convolutional architecture—enabled by a Tiny AutoEncoder, a small coarse estimator, and a U-Net-based velocity field—yields model sizes around tens of millions of parameters and高 FPS, suitable for edge devices. The method achieves competitive distortion and perceptual quality across blind face restoration, super-resolution, denoising, and inpainting while significantly reducing computational cost relative to diffusion and flow-based baselines. Theoretical bounds on Wasserstein-2 distance link encoder-decoder and vector-field errors to reconstruction quality, and extensive ablations demonstrate the efficiency and importance of the coarse estimator, encoder fine-tuning, and multi-segment latent training.

Abstract

Recent advances in generative image restoration (IR) have demonstrated impressive results. However, these methods are hindered by their substantial size and computational demands, rendering them unsuitable for deployment on edge devices. This work introduces ELIR, an Efficient Latent Image Restoration method. ELIR addresses the distortion-perception trade-off within the latent space and produces high-quality images using a latent consistency flow-based model. In addition, ELIR introduces an efficient and lightweight architecture. Consequently, ELIR is 4 smaller and faster than state-of-the-art diffusion and flow-based approaches for blind face restoration, enabling a deployment on resource-constrained devices. Comprehensive evaluations of various image restoration tasks and datasets show that ELIR achieves competitive performance compared to state-of-the-art methods, effectively balancing distortion and perceptual quality metrics while significantly reducing model size and computational cost. The code is available at: https://github.com/eladc-git/ELIR

Paper Structure

This paper contains 39 sections, 1 theorem, 15 equations, 11 figures, 11 tables, 1 algorithm.

Key Result

Theorem 7.1

Let $\bm{x}\in\mathcal{X}\subseteq\mathbb{R}^{d_{x}}$ be a random vector that represents an HQ image, $\hat{\bm{x}}=\mathcal{D}\left(\hat{\bm{z}}_1\right)\in\mathcal{X}\subseteq\mathbb{R}^{d_{x}}$ be a vector that represents the reconstructed image using a random latent variable $\hat{\bm{z}}_1\in\m where $\bm{z}_0$ is some predefined source distribution (usually a standard Gaussian distribution),

Figures (11)

  • Figure 1: ELIR's Performance: Comparison between ELIR and state-of-the-art baseline methods. ELIR is the smallest and fastest method while maintaining competitive results. Metrics such as LPIPS and #Params, where smaller is better, are inverted and normalized for display. The results were obtained using the CelebA-Test dataset for blind face restoration.
  • Figure 2: ELIR Overview. During training, we optimize the encoder $\mathcal{E}_{\omega}$, coarse estimator $g_{\phi}$, and the vector field $\bm{v}_{\theta}$ for a specific IR task. During inference, we predict a consistent linear direction from LQ toward the HQ images, yielding high-quality results and balancing distortion and perception. Both training and inference are conducted in the latent space.
  • Figure 3: BFR Visual Results. Visual comparisons between ELIR and baseline models sampled from CelebA-Test for blind face restoration. HQ and LQ refer to high-quality (ground truth) and low-quality (inputs) images.
  • Figure 4: BSR Visual Results. Visual comparisons between ELIR and baseline models sampled from ImageNet-Validation for blind super-resolution. HQ and LQ refer to high-quality (ground truth) and low-quality (inputs) images.
  • Figure 5: Face Restoration Visual Results. Visual results of ELIR for face super resolution ($\times$8), denoising, and inpainting.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Theorem 7.1