Table of Contents
Fetching ...

Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution

Chengyan Deng, Zhangquan Chen, Li Yu, Kai Zhang, Xue Zhou, Wang Zhang

TL;DR

GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR that introduces a Trajectory Alignment strategy to rectify the tangent vector field via full-path projection, and a Dual-Reference Structural Rectification mechanism to enforce strict structural constraints.

Abstract

Diffusion-based Real-World Image Super-Resolution (Real-ISR) achieves impressive perceptual quality but suffers from high computational costs due to iterative sampling. While recent distillation approaches leveraging large-scale Text-to-Image (T2I) priors have enabled one-step generation, they are typically hindered by prohibitive parameter counts and the inherent capability bounds imposed by teacher models. As a lightweight alternative, Consistency Models offer efficient inference but struggle with two critical limitations: the accumulation of consistency drift inherent to transitive training, and a phenomenon we term "Geometric Decoupling" - where the generative trajectory achieves pixel-wise alignment yet fails to preserve structural coherence. To address these challenges, we propose GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR. Specifically, we introduce a Trajectory Alignment (TA) strategy to rectify the tangent vector field via full-path projection, and a Dual-Reference Structural Rectification (DRSR) mechanism to enforce strict structural constraints. Extensive experiments verify that GTASR delivers superior performance over representative baselines while maintaining minimal latency. The code and model will be released at https://github.com/Blazedengcy/GTASR.

Joint Geometric and Trajectory Consistency Learning for One-Step Real-World Super-Resolution

TL;DR

GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR that introduces a Trajectory Alignment strategy to rectify the tangent vector field via full-path projection, and a Dual-Reference Structural Rectification mechanism to enforce strict structural constraints.

Abstract

Diffusion-based Real-World Image Super-Resolution (Real-ISR) achieves impressive perceptual quality but suffers from high computational costs due to iterative sampling. While recent distillation approaches leveraging large-scale Text-to-Image (T2I) priors have enabled one-step generation, they are typically hindered by prohibitive parameter counts and the inherent capability bounds imposed by teacher models. As a lightweight alternative, Consistency Models offer efficient inference but struggle with two critical limitations: the accumulation of consistency drift inherent to transitive training, and a phenomenon we term "Geometric Decoupling" - where the generative trajectory achieves pixel-wise alignment yet fails to preserve structural coherence. To address these challenges, we propose GTASR (Geometric Trajectory Alignment Super-Resolution), a simple yet effective consistency training paradigm for Real-ISR. Specifically, we introduce a Trajectory Alignment (TA) strategy to rectify the tangent vector field via full-path projection, and a Dual-Reference Structural Rectification (DRSR) mechanism to enforce strict structural constraints. Extensive experiments verify that GTASR delivers superior performance over representative baselines while maintaining minimal latency. The code and model will be released at https://github.com/Blazedengcy/GTASR.
Paper Structure (32 sections, 39 equations, 13 figures, 9 tables, 1 algorithm)

This paper contains 32 sections, 39 equations, 13 figures, 9 tables, 1 algorithm.

Figures (13)

  • Figure 1: The pipeline of the proposed GTASR. We employ a two-stage training scheme. In Stage I, we train the online model $f_{\theta}$ using the standard $\mathcal{L}_\mathrm{CT}$, augmented by our proposed $\mathcal{L}_\mathrm{TA}$, where the reference parameters $\theta^-$ are updated at every step using the stop-gradient online parameters. In Stage II, we address geometric decoupling by introducing a target model $f_{\theta'}$ initialized with the pre-trained weights from Stage I, with its parameters updated via periodic synchronization to ensure temporal alignment. We feed the intermediate states $\hat{x}_{t'}$ and $x_{t'}$ into $f_{\theta'}$ to obtain the trajectory endpoints ($x^\mathrm{fake}_0$ and $x^\mathrm{real}_0$), which are directly utilized for $\mathcal{L}_\mathrm{DTM}$ while their structure maps extracted via the Sobel operator are utilized for the proposed DRSR objectives ($\mathcal{L}_\mathrm{Stab}$ and $\mathcal{L}_\mathrm{Rect}$). Finally, the gradients derived from $\mathcal{L}_\mathrm{CT}$, $\mathcal{L}_\mathrm{DTM}$, $\mathcal{L}_\mathrm{Stab}$, and $\mathcal{L}_\mathrm{Rect}$ are backpropagated to $f_{\theta}$ to jointly enhance perceptual realism and structural integrity.
  • Figure 2: Visual analysis of consistency across timesteps. (Left) Baseline CT suffers from severe trajectory drift and detail loss as $t$ increases, particularly in the eye region. (Right) After introducing TA strategy effectively preserves sharp features even at large time steps.
  • Figure 3: Visual comparison of different strategies. (a) The baseline $\mathcal{L}_\mathrm{CT}$ fails to model fine details effectively. (b) Integrating $\mathcal{L}_\mathrm{TP}$ with $\mathcal{L}_\mathrm{CT}$ guides the reconstruction direction; however, the metallic grille remains overly blurred and lacks definition. (c) Ours restores intricate mesh textures with superior clarity.
  • Figure 4: Visualization of Geometric Decoupling. Evaluated on 3,000 ImageNet-Test images, the Baseline (blue) achieves positional convergence (low y-axis error) but fails to maintain structural stability (high x-axis variance). In contrast, GTASR (red) minimizes errors on both metrics, effectively resolving the decoupling.
  • Figure 5: Visual comparisons of different methods on two synthetic examples of the ImageNet-Test dataset.
  • ...and 8 more figures