Hyper-Transforming Latent Diffusion Models
Ignacio Peis, Batuhan Koyuncu, Isabel Valera, Jes Frellsen
TL;DR
This paper tackles the challenge of extending diffusion-based generative modeling to continuous function representations by learning probabilistic distributions over implicit neural network parameters. It introduces Latent Diffusion Models of Implicit Neural Representations (LDMI) with a Transformer-based Hyper-Transformer Decoder (HD) that maps latent codes to INR weights, enabling uncertainty-aware function generation across modalities. LDMI supports full training or a hyper-transforming mode that adapts pre-trained latent diffusion models by training only the decoder, promoting scalability and transferability. Empirical results across natural images, 3D shapes, and climate data demonstrate high-quality generation and accurate reconstructions at arbitrary resolutions, highlighting LDMI's versatility and potential as a unified framework for structured function representations.
Abstract
We introduce a novel generative framework for functions by integrating Implicit Neural Representations (INRs) and Transformer-based hypernetworks into latent variable models. Unlike prior approaches that rely on MLP-based hypernetworks with scalability limitations, our method employs a Transformer-based decoder to generate INR parameters from latent variables, addressing both representation capacity and computational efficiency. Our framework extends latent diffusion models (LDMs) to INR generation by replacing standard decoders with a Transformer-based hypernetwork, which can be trained either from scratch or via hyper-transforming: a strategy that fine-tunes only the decoder while freezing the pre-trained latent space. This enables efficient adaptation of existing generative models to INR-based representations without requiring full retraining. We validate our approach across multiple modalities, demonstrating improved scalability, expressiveness, and generalization over existing INR-based generative models. Our findings establish a unified and flexible framework for learning structured function representations.
