Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models

Purui Bai; Junxian Duan; Pin Wang; Jinhua Hao; Ming Sun; Chao Zhou; Huaibo Huang

Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models

Purui Bai, Junxian Duan, Pin Wang, Jinhua Hao, Ming Sun, Chao Zhou, Huaibo Huang

Abstract

Although diffusion-based real-world image restoration (Real-IR) has achieved remarkable progress, efficiently leveraging ultra-large-scale pre-trained text-to-image (T2I) models and fully exploiting their potential remain significant challenges. To address this issue, we propose ResFlow-Tuner, an image restoration framework based on the state-of-the-art flow matching model, FLUX.1-dev, which integrates unified multi-modal fusion (UMMF) with test-time scaling (TTS) to achieve unprecedented restoration performance. Our approach fully leverages the advantages of the Multi-Modal Diffusion Transformer (MM-DiT) architecture by encoding multi-modal conditions into a unified sequence that guides the synthesis of high-quality images. Furthermore, we introduce a training-free test-time scaling paradigm tailored for image restoration. During inference, this technique dynamically steers the denoising direction through feedback from a reward model (RM), thereby achieving significant performance gains with controllable computational overhead. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple standard benchmarks. This work not only validates the powerful capabilities of the flow matching model in low-level vision tasks but, more importantly, proposes a novel and efficient inference-time scaling paradigm suitable for large pre-trained models.

Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models

Abstract

Paper Structure (23 sections, 5 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 23 sections, 5 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Method
Flow-based Image Restoration Framework
Stage-I: Multi-Modal Condition Preparation
Stage-II: Unified Multi-Modal Fusion (UMMF) and ODE Solving
Test-Time Scaling for ODE-based Flow Models
ODE-SDE Transformation Framework.
Formalizing TTS as Trajectory Optimization in Latent Space.
ODE-aware Search with Verifier Ensemble.
Experiments
Experimental Setup
Comparison with State-of-the-Art Methods
Evaluation on Downstream Benchmarks
Ablation Study
...and 8 more sections

Figures (10)

Figure 1: ResFlow-Tuner delivers superior performance on both synthetic (the first row) and real-world (the second row) benchmarks, excelling in terms of perceptual quality and objective image quality assessment.
Figure 2: Architecture of the proposed ResFlow-Tuner. ResFlow-Tuner enhances training performance through the seamless integration of multi-modal guidance. During inference, it adopts a greedy optimization strategy for path selection, augmented by our Multi-Step Partial Denoising Estimator (MSPDE) for more accurate path evaluation.
Figure 3: Qualitative comparisons on both synthetic (the first row) and real-world (the last three rows) benchmarks. Please zoom in for a better view.
Figure 4: User Study Results. (a) Average ranking of the six methods across all participants and test images, with error bars. (b) Top-K ratios (K=1,2,3,4,5) demonstrating our method's consistency in producing high-quality results across diverse image content.
Figure 5: Visual comparisons for ablation study on ResFlow-Tuner (1/2).
...and 5 more figures

Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models

Abstract

Tuning Real-World Image Restoration at Inference: A Test-Time Scaling Paradigm for Flow Matching Models

Authors

Abstract

Table of Contents

Figures (10)