Table of Contents
Fetching ...

Hyper-Transforming Latent Diffusion Models

Ignacio Peis, Batuhan Koyuncu, Isabel Valera, Jes Frellsen

TL;DR

This paper tackles the challenge of extending diffusion-based generative modeling to continuous function representations by learning probabilistic distributions over implicit neural network parameters. It introduces Latent Diffusion Models of Implicit Neural Representations (LDMI) with a Transformer-based Hyper-Transformer Decoder (HD) that maps latent codes to INR weights, enabling uncertainty-aware function generation across modalities. LDMI supports full training or a hyper-transforming mode that adapts pre-trained latent diffusion models by training only the decoder, promoting scalability and transferability. Empirical results across natural images, 3D shapes, and climate data demonstrate high-quality generation and accurate reconstructions at arbitrary resolutions, highlighting LDMI's versatility and potential as a unified framework for structured function representations.

Abstract

We introduce a novel generative framework for functions by integrating Implicit Neural Representations (INRs) and Transformer-based hypernetworks into latent variable models. Unlike prior approaches that rely on MLP-based hypernetworks with scalability limitations, our method employs a Transformer-based decoder to generate INR parameters from latent variables, addressing both representation capacity and computational efficiency. Our framework extends latent diffusion models (LDMs) to INR generation by replacing standard decoders with a Transformer-based hypernetwork, which can be trained either from scratch or via hyper-transforming: a strategy that fine-tunes only the decoder while freezing the pre-trained latent space. This enables efficient adaptation of existing generative models to INR-based representations without requiring full retraining. We validate our approach across multiple modalities, demonstrating improved scalability, expressiveness, and generalization over existing INR-based generative models. Our findings establish a unified and flexible framework for learning structured function representations.

Hyper-Transforming Latent Diffusion Models

TL;DR

This paper tackles the challenge of extending diffusion-based generative modeling to continuous function representations by learning probabilistic distributions over implicit neural network parameters. It introduces Latent Diffusion Models of Implicit Neural Representations (LDMI) with a Transformer-based Hyper-Transformer Decoder (HD) that maps latent codes to INR weights, enabling uncertainty-aware function generation across modalities. LDMI supports full training or a hyper-transforming mode that adapts pre-trained latent diffusion models by training only the decoder, promoting scalability and transferability. Empirical results across natural images, 3D shapes, and climate data demonstrate high-quality generation and accurate reconstructions at arbitrary resolutions, highlighting LDMI's versatility and potential as a unified framework for structured function representations.

Abstract

We introduce a novel generative framework for functions by integrating Implicit Neural Representations (INRs) and Transformer-based hypernetworks into latent variable models. Unlike prior approaches that rely on MLP-based hypernetworks with scalability limitations, our method employs a Transformer-based decoder to generate INR parameters from latent variables, addressing both representation capacity and computational efficiency. Our framework extends latent diffusion models (LDMs) to INR generation by replacing standard decoders with a Transformer-based hypernetwork, which can be trained either from scratch or via hyper-transforming: a strategy that fine-tunes only the decoder while freezing the pre-trained latent space. This enables efficient adaptation of existing generative models to INR-based representations without requiring full retraining. We validate our approach across multiple modalities, demonstrating improved scalability, expressiveness, and generalization over existing INR-based generative models. Our findings establish a unified and flexible framework for learning structured function representations.

Paper Structure

This paper contains 37 sections, 25 equations, 12 figures, 5 tables, 2 algorithms.

Figures (12)

  • Figure 1: Samples from LDMI at multiple resolutions.
  • Figure 3: Diagram of the Hyper-Transformer Decoder (HD). The latent variable ${\bm{z}}$ is tokenized and processed by a Transformer Encoder. A Transformer Decoder, initialized with learnable grouped weights $\bar{{\mathbf{W}}}_l^\text{i}$, cross-attends to the latent tokens to generate the set of grouped weights ${\mathbf{W}}_l^\text{o}$. The full weight matrices ${\mathbf{W}}_l$ are then reconstructed by combining the grouped weights with learnable template weights $\bar{{\mathbf{W}}}_l^\text{b}$. Biases $\mathbf{b}_l$ are learned as global parameters.
  • Figure 4: Schematic of the DDIM-based latent diffusion.
  • Figure 5: Uncurated samples from GASP, Functa (diffusion-based) and our LDMI trained with CelebA-HQ at 64$\times$64 resolution (a). In (b), our model was trained on CelebA-HQ at 256$\times$256 using the hyper-transforming approach.
  • Figure 6: Reconstructions from Functa (diffusion-based), VAMoH, and our LDMI compared to ground truth (GT) across three datasets (a)--(c). In (d) and (e) LDMI was trained by hyper-transforming pre-trained LDMs.
  • ...and 7 more figures