Table of Contents
Fetching ...

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

Shuai Wang, Zexian Li, Tianhui Song, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

TL;DR

FlowDCN is proposed, a purely convolution-based generative model with linear time and memory complexity that can efficiently generate high-quality images at arbitrary resolutions that achieves the state-of-the-art 4.30 sFID on the ImageNet Benchmark and comparable resolution extrapolation results.

Abstract

Arbitrary-resolution image generation still remains a challenging task in AIGC, as it requires handling varying resolutions and aspect ratios while maintaining high visual quality. Existing transformer-based diffusion methods suffer from quadratic computation cost and limited resolution extrapolation capabilities, making them less effective for this task. In this paper, we propose FlowDCN, a purely convolution-based generative model with linear time and memory complexity, that can efficiently generate high-quality images at arbitrary resolutions. Equipped with a new design of learnable group-wise deformable convolution block, our FlowDCN yields higher flexibility and capability to handle different resolutions with a single model. FlowDCN achieves the state-of-the-art 4.30 sFID on $256\times256$ ImageNet Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only $\frac{1}{5}$ images), visual quality, parameters ($8\%$ reduction) and FLOPs ($20\%$ reduction). We believe FlowDCN offers a promising solution to scalable and flexible image synthesis.

FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

TL;DR

FlowDCN is proposed, a purely convolution-based generative model with linear time and memory complexity that can efficiently generate high-quality images at arbitrary resolutions that achieves the state-of-the-art 4.30 sFID on the ImageNet Benchmark and comparable resolution extrapolation results.

Abstract

Arbitrary-resolution image generation still remains a challenging task in AIGC, as it requires handling varying resolutions and aspect ratios while maintaining high visual quality. Existing transformer-based diffusion methods suffer from quadratic computation cost and limited resolution extrapolation capabilities, making them less effective for this task. In this paper, we propose FlowDCN, a purely convolution-based generative model with linear time and memory complexity, that can efficiently generate high-quality images at arbitrary resolutions. Equipped with a new design of learnable group-wise deformable convolution block, our FlowDCN yields higher flexibility and capability to handle different resolutions with a single model. FlowDCN achieves the state-of-the-art 4.30 sFID on ImageNet Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only images), visual quality, parameters ( reduction) and FLOPs ( reduction). We believe FlowDCN offers a promising solution to scalable and flexible image synthesis.

Paper Structure

This paper contains 29 sections, 12 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: The Architecture of Our FlowDCN and MultiScale DCN Block.
  • Figure 2: Visualization Comparison with SiT. Best viewed zoomed-in. We sample both our FlowDCN-XL/2 and SiT-XL/2 with Euler ODE solver under 2, 3, 4, 5, 8, 10 steps using the same latent noise. At the fewer steps sampling scenery, our FlowDCN generates slightly clearer and higher-quality images.
  • Figure 3: Visualization Comparison about $S_\text{max}$ Adjustment.Here are the $512\times512$, $256\times512$ and $512\times256$, three type resolution images. We employ the same latent noise as start, sampling with Euler SDE solver for 250 steps. With $S_\text{max}$ Adjustment, sampled images consistently looks better.