DNF: Unconditional 4D Generation with Dictionary-based Neural Fields
Xinyi Zhang, Naiqi Li, Angela Dai
TL;DR
DNF addresses unconditional 4D generation of deforming shapes by introducing a dictionary-based neural field representation that decouples shape and motion via a shared dictionary derived from SVD of MLPs. A transformer-based diffusion model operates in the weight space of the dictionary-encoded 4D fields, with separate shape and motion streams and a sliding-window strategy to manage long sequences. Key contributions include dictionary-based fine-tuning with a compressed dictionary and residual extensions, plus per-shape coefficient vectors that preserve contiguity while enabling high fidelity. Experiments on DeformingThings4D demonstrate state-of-the-art generation quality and generalization to unseen identities, offering a compact, scalable approach for high-dimensional dynamic 3D data.
Abstract
While remarkable success has been achieved through diffusion-based 3D generative models for shapes, 4D generative modeling remains challenging due to the complexity of object deformations over time. We propose DNF, a new 4D representation for unconditional generative modeling that efficiently models deformable shapes with disentangled shape and motion while capturing high-fidelity details in the deforming objects. To achieve this, we propose a dictionary learning approach to disentangle 4D motion from shape as neural fields. Both shape and motion are represented as learned latent spaces, where each deformable shape is represented by its shape and motion global latent codes, shape-specific coefficient vectors, and shared dictionary information. This captures both shape-specific detail and global shared information in the learned dictionary. Our dictionary-based representation well balances fidelity, contiguity and compression -- combined with a transformer-based diffusion model, our method is able to generate effective, high-fidelity 4D animations.
