Table of Contents
Fetching ...

Fractal Generative Models

Tianhong Li, Qinyi Sun, Lijie Fan, Kaiming He

TL;DR

The paper introduces fractal generative models, a modular framework that recursively composes generative modules to create self-similar architectures for modeling high-dimensional, non-sequential data. It instantiates this idea with autoregressive generators, forming FractalAR and FractalMAR variants, and demonstrates pixel-by-pixel image generation on ImageNet, achieving competitive likelihoods on 64×64 images and high-quality 256×256 samples with scalable compute. The approach leverages a divide-and-conquer, hierarchical structure to reduce computation compared to full-scale attention and tokenization-based methods, while enabling interpretable, controllable generation. These results suggest fractal modularization as a promising paradigm for future generative modeling across data with intrinsic multi-scale structure.

Abstract

Modularization is a cornerstone of computer science, abstracting complex functions into atomic building blocks. In this paper, we introduce a new level of modularization by abstracting generative models into atomic generative modules. Analogous to fractals in mathematics, our method constructs a new type of generative model by recursively invoking atomic generative modules, resulting in self-similar fractal architectures that we call fractal generative models. As a running example, we instantiate our fractal framework using autoregressive models as the atomic generative modules and examine it on the challenging task of pixel-by-pixel image generation, demonstrating strong performance in both likelihood estimation and generation quality. We hope this work could open a new paradigm in generative modeling and provide a fertile ground for future research. Code is available at https://github.com/LTH14/fractalgen.

Fractal Generative Models

TL;DR

The paper introduces fractal generative models, a modular framework that recursively composes generative modules to create self-similar architectures for modeling high-dimensional, non-sequential data. It instantiates this idea with autoregressive generators, forming FractalAR and FractalMAR variants, and demonstrates pixel-by-pixel image generation on ImageNet, achieving competitive likelihoods on 64×64 images and high-quality 256×256 samples with scalable compute. The approach leverages a divide-and-conquer, hierarchical structure to reduce computation compared to full-scale attention and tokenization-based methods, while enabling interpretable, controllable generation. These results suggest fractal modularization as a promising paradigm for future generative modeling across data with intrinsic multi-scale structure.

Abstract

Modularization is a cornerstone of computer science, abstracting complex functions into atomic building blocks. In this paper, we introduce a new level of modularization by abstracting generative models into atomic generative modules. Analogous to fractals in mathematics, our method constructs a new type of generative model by recursively invoking atomic generative modules, resulting in self-similar fractal architectures that we call fractal generative models. As a running example, we instantiate our fractal framework using autoregressive models as the atomic generative modules and examine it on the challenging task of pixel-by-pixel image generation, demonstrating strong performance in both likelihood estimation and generation quality. We hope this work could open a new paradigm in generative modeling and provide a fertile ground for future research. Code is available at https://github.com/LTH14/fractalgen.

Paper Structure

This paper contains 29 sections, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Our fractal framework can generate high-quality images pixel-by-pixel. We show the generation process of a 256$\times$256 image by recursively calling autoregressive models in autoregressive models. We also provide example videos in our GitHub repository to illustrate the generation process.
  • Figure 2: Instantiation of our fractal method on pixel-by-pixel image generation. In each fractal level, an autoregressive model receives the output from the previous generator, concatenates it with the corresponding image patches, and employs multiple transformer blocks to produce a set of outputs for the next generators.
  • Figure 3: Pixel-by-pixel generation results from FractalMAR-H on ImageNet 256$\times$256. Our fractal method can generate high-quality high-resolution images in a pixel-by-pixel manner, with an average throughput of 1.29 seconds per image. More qualitative results are in \ref{['fig:more-qualitative']}.
  • Figure 4: Conditional pixel-by-pixel prediction results, including image inpainting (first row), outpainting (second row), uncropping (outpainting on a large mask, third row), and class-conditional editing (inpainting with another class label, fourth row).
  • Figure 5: Two variants for autoregressive modeling. The AR variant models the sequence in a raster-scan order using a causal transformer, while the MAR variant models the sequence in a random order using a bidirectional transformer. Both are valid generators to build our fractal framework.
  • ...and 2 more figures