Table of Contents
Fetching ...

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang

TL;DR

HiFlow tackles the challenge of high-resolution image generation without retraining by introducing a virtual reference flow derived from a low-resolution sampling trajectory. It uses flow-aligned guidance across initialization, direction, and acceleration to preserve structure, ensure low-frequency consistency, and enhance detail fidelity in high-resolution synthesis. The approach is model-agnostic and demonstrates competitive or superior results against training-based methods, while offering practical benefits such as faster inference and compatibility with LoRA, ControlNet, and quantization. Extensive experiments on Flux-based backbones and various architectures corroborate its effectiveness and versatility in delivering high-quality 2K–4K outputs. Overall, HiFlow provides a robust, training-free pathway to unlock the resolution potential of pre-trained flow models for T2I generation.

Abstract

Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.

HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

TL;DR

HiFlow tackles the challenge of high-resolution image generation without retraining by introducing a virtual reference flow derived from a low-resolution sampling trajectory. It uses flow-aligned guidance across initialization, direction, and acceleration to preserve structure, ensure low-frequency consistency, and enhance detail fidelity in high-resolution synthesis. The approach is model-agnostic and demonstrates competitive or superior results against training-based methods, while offering practical benefits such as faster inference and compatibility with LoRA, ControlNet, and quantization. Extensive experiments on Flux-based backbones and various architectures corroborate its effectiveness and versatility in delivering high-quality 2K–4K outputs. Overall, HiFlow provides a robust, training-free pathway to unlock the resolution potential of pre-trained flow models for T2I generation.

Abstract

Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.

Paper Structure

This paper contains 20 sections, 17 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Gallery of HiFlow. The proposed HiFlow enables pre-trained text-to-image flow models (Flux.1.0-dev integrated with various LoRA models) to synthesize high-resolution images with high fidelity and rich details in a training-free manner. All prompts are listed in the appendix.
  • Figure 2: T2I models suffer significant quality degradation in high-resolution image generation.
  • Figure 3: Observations. (a) Distribution discrepancy between predicted clean sample $X_{0\leftarrow t }$ and clean sample $X_0$. (b) Comparison with constant and time-dependent direction guidance. The former exhibits artifacts, the latter demonstrates better structure preservation. (c) Visualization of acceleration. (d) Effect of acceleration alignment, validating its role in facilitating high-fidelity details generation.
  • Figure 4: Pipeline of HiFlow. HiFlow constructs reference flow from low-resolution sampling trajectory to offer guidance for high-resolution generation in initialization, direction, and acceleration.
  • Figure 5: Visual comparison of synthesized 2K and 4K images. HiFlow yields high-resolution images characterized by high-fidelity details and coherent structure. Best viewed zoomed in.
  • ...and 13 more figures