Diffusing Differentiable Representations
Yash Savani, Marc Finzi, J. Zico Kolter
TL;DR
The paper tackles sampling differentiable representations (diffreps) with pretrained diffusion models in a training-free setting. It reframes diffusion sampling by pulling back the reverse-time dynamics to the diffrep parameter space, deriving a correct PF-ODE that includes a (JᵀJ)⁻¹ term and integrating a consistency constraint via RePaint to maintain renderability across views. This approach yields true sampling rather than mode-seeking, delivering higher detail and diversity for images, panoramas, and NeRFs while offering competitive runtimes. The work extends diffusion-model capabilities to multi-view and 3D-conditional generation, with strong empirical results and a clear pathway for future enhancements and broader applicability.
Abstract
We introduce a novel, training-free method for sampling differentiable representations (diffreps) using pretrained diffusion models. Rather than merely mode-seeking, our method achieves sampling by "pulling back" the dynamics of the reverse-time process--from the image space to the diffrep parameter space--and updating the parameters according to this pulled-back process. We identify an implicit constraint on the samples induced by the diffrep and demonstrate that addressing this constraint significantly improves the consistency and detail of the generated objects. Our method yields diffreps with substantially improved quality and diversity for images, panoramas, and 3D NeRFs compared to existing techniques. Our approach is a general-purpose method for sampling diffreps, expanding the scope of problems that diffusion models can tackle.
