HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Sudarshan Babu, Richard Liu, Avery Zhou, Michael Maire, Greg Shakhnarovich, Rana Hanocka
TL;DR
HyperFields tackles the bottleneck of text-to-3D NeRF generation by learning a shared, dynamic hypernetwork that maps text embeddings to NeRF weights, enabling zero-shot synthesis of unseen scenes. It introduces NeRF distillation to supervise training across many scenes, combining a progressive, activation-conditioned weight generation with a distillation loss to avoid SDS pitfalls. The approach achieves in-distribution zero-shot generalization to novel prompt combinations and accelerated convergence for out-of-distribution prompts, with substantial speedups over per-scene optimization and compatibility with high-detail teacher models like Prolific Dreamer. The work demonstrates a scalable path toward open-vocabulary, fast text-to-NeRF generation, with potential to generalize to other implicit 3D representations. Limitations include dependence on teacher model quality and inherited biases from diffusion models.
Abstract
We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.
