HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
Jiazi Bu, Pengyang Ling, Yujie Zhou, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang
TL;DR
HiFlow tackles the challenge of high-resolution image generation without retraining by introducing a virtual reference flow derived from a low-resolution sampling trajectory. It uses flow-aligned guidance across initialization, direction, and acceleration to preserve structure, ensure low-frequency consistency, and enhance detail fidelity in high-resolution synthesis. The approach is model-agnostic and demonstrates competitive or superior results against training-based methods, while offering practical benefits such as faster inference and compatibility with LoRA, ControlNet, and quantization. Extensive experiments on Flux-based backbones and various architectures corroborate its effectiveness and versatility in delivering high-quality 2K–4K outputs. Overall, HiFlow provides a robust, training-free pathway to unlock the resolution potential of pre-trained flow models for T2I generation.
Abstract
Text-to-image (T2I) diffusion/flow models have drawn considerable attention recently due to their remarkable ability to deliver flexible visual creations. Still, high-resolution image synthesis presents formidable challenges due to the scarcity and complexity of high-resolution content. Recent approaches have investigated training-free strategies to enable high-resolution image synthesis with pre-trained models. However, these techniques often struggle with generating high-quality visuals and tend to exhibit artifacts or low-fidelity details, as they typically rely solely on the endpoint of the low-resolution sampling trajectory while neglecting intermediate states that are critical for preserving structure and synthesizing finer detail. To this end, we present HiFlow, a training-free and model-agnostic framework to unlock the resolution potential of pre-trained flow models. Specifically, HiFlow establishes a virtual reference flow within the high-resolution space that effectively captures the characteristics of low-resolution flow information, offering guidance for high-resolution generation through three key aspects: initialization alignment for low-frequency consistency, direction alignment for structure preservation, and acceleration alignment for detail fidelity. By leveraging such flow-aligned guidance, HiFlow substantially elevates the quality of high-resolution image synthesis of T2I models and demonstrates versatility across their personalized variants. Extensive experiments validate HiFlow's capability in achieving superior high-resolution image quality over state-of-the-art methods.
