Table of Contents
Fetching ...

FGM-HD: Boosting Generation Diversity of Fractal Generative Models through Hausdorff Dimension Induction

Haowei Zhang, Yuanpei Zhao, Ji-Zhe Zhou, Mao Li

TL;DR

This work tackles the limited diversity of fractal-based image generation by introducing Hausdorff Dimension (HD) as a structural diversity cue. It combines a learnable HD estimator that predicts $HD$ from image embeddings, a Monotonic Momentum-Driven Scheduling (MMDS) that progressively weights the HD loss in the training objective $L_{total} = L_{gen} + \lambda(t) L_{HD}$, and an HD-guided sampling threshold $\tau$ to filter outputs at inference. The method demonstrates a 39% improvement in Recall on ImageNet while preserving image quality, and is the first to integrate HD into Fractal Generative Models (FGMs). The approach also provides a generalizable optimization framework for hybrid-loss models and suggests potential extensions to conditional and multi-modal generation with perceptually aligned HD estimation.

Abstract

Improving the diversity of generated results while maintaining high visual quality remains a significant challenge in image generation tasks. Fractal Generative Models (FGMs) are efficient in generating high-quality images, but their inherent self-similarity limits the diversity of output images. To address this issue, we propose a novel approach based on the Hausdorff Dimension (HD), a widely recognized concept in fractal geometry used to quantify structural complexity, which aids in enhancing the diversity of generated outputs. To incorporate HD into FGM, we propose a learnable HD estimation method that predicts HD directly from image embeddings, addressing computational cost concerns. However, simply introducing HD into a hybrid loss is insufficient to enhance diversity in FGMs due to: 1) degradation of image quality, and 2) limited improvement in generation diversity. To this end, during training, we adopt an HD-based loss with a monotonic momentum-driven scheduling strategy to progressively optimize the hyperparameters, obtaining optimal diversity without sacrificing visual quality. Moreover, during inference, we employ HD-guided rejection sampling to select geometrically richer outputs. Extensive experiments on the ImageNet dataset demonstrate that our FGM-HD framework yields a 39\% improvement in output diversity compared to vanilla FGMs, while preserving comparable image quality. To our knowledge, this is the very first work introducing HD into FGM. Our method effectively enhances the diversity of generated outputs while offering a principled theoretical contribution to FGM development.

FGM-HD: Boosting Generation Diversity of Fractal Generative Models through Hausdorff Dimension Induction

TL;DR

This work tackles the limited diversity of fractal-based image generation by introducing Hausdorff Dimension (HD) as a structural diversity cue. It combines a learnable HD estimator that predicts from image embeddings, a Monotonic Momentum-Driven Scheduling (MMDS) that progressively weights the HD loss in the training objective , and an HD-guided sampling threshold to filter outputs at inference. The method demonstrates a 39% improvement in Recall on ImageNet while preserving image quality, and is the first to integrate HD into Fractal Generative Models (FGMs). The approach also provides a generalizable optimization framework for hybrid-loss models and suggests potential extensions to conditional and multi-modal generation with perceptually aligned HD estimation.

Abstract

Improving the diversity of generated results while maintaining high visual quality remains a significant challenge in image generation tasks. Fractal Generative Models (FGMs) are efficient in generating high-quality images, but their inherent self-similarity limits the diversity of output images. To address this issue, we propose a novel approach based on the Hausdorff Dimension (HD), a widely recognized concept in fractal geometry used to quantify structural complexity, which aids in enhancing the diversity of generated outputs. To incorporate HD into FGM, we propose a learnable HD estimation method that predicts HD directly from image embeddings, addressing computational cost concerns. However, simply introducing HD into a hybrid loss is insufficient to enhance diversity in FGMs due to: 1) degradation of image quality, and 2) limited improvement in generation diversity. To this end, during training, we adopt an HD-based loss with a monotonic momentum-driven scheduling strategy to progressively optimize the hyperparameters, obtaining optimal diversity without sacrificing visual quality. Moreover, during inference, we employ HD-guided rejection sampling to select geometrically richer outputs. Extensive experiments on the ImageNet dataset demonstrate that our FGM-HD framework yields a 39\% improvement in output diversity compared to vanilla FGMs, while preserving comparable image quality. To our knowledge, this is the very first work introducing HD into FGM. Our method effectively enhances the diversity of generated outputs while offering a principled theoretical contribution to FGM development.

Paper Structure

This paper contains 31 sections, 2 equations, 10 figures, 7 tables, 2 algorithms.

Figures (10)

  • Figure 1: Overview of the FGM process, where the image is generated from sparse patches to $256 \times 256$ resolution through recursive refinement at smaller scales (e.g., $16 \times 16$ and $4 \times 4$ blocks). A shared generative module is reused across scales, capturing global and fine-grained structural details.
  • Figure 2: Overview of the proposed FGM-HD framework. (a) Training and Sampling Strategies: During training (gray line), input noise is recursively processed by the FGM (purple section), and the generated images are evaluated by the HD estimation module to compute HD loss. Then, the HD loss is dynamically weighted by the MMDS strategy (green section) through $\lambda(t)$ to balance image quality and structural diversity during optimization. During inference (blue line), a batch of samples is generated from noise via FGM and passed through the Sampling Strategy (blue section). Only geometrically richer outputs with HD values above the threshold ($\tau$) are retained while others are discarded and regenerated, ensuring structurally diverse outputs without modifying the generator architecture. (b) Hausdorff Dimension Estimation: The HD estimation (yellow section) is performed using a multi-scale convolutional network built upon the ResNet152 architecture, enabling accurate and efficient HD prediction directly from image embeddings.
  • Figure 3: Evolution of image quality and HD variance across training epochs. Early-stage generations are noisy with unstable HD values, while later epochs yield high-quality images with more reliable HD estimation.
  • Figure 4: Performance trends of evaluation metrics under varying HD thresholds.
  • Figure 5: Visualization of the MMDS strategy. The blue curve shows the variation of $\lambda$ optimized by MMDS, the red curve represents the fixed exponential loss (final value: 3.14), and the green curve shows the MMDS loss (final value: 2.85). MMDS Strategy leads to a smoother and lower loss trajectory during training.
  • ...and 5 more figures