Curved Diffusion: A Generative Model With Optical Geometry Control
Andrey Voynov, Amir Hertz, Moab Arar, Shlomi Fruchter, Daniel Cohen-Or
TL;DR
This work tackles the oversight in diffusion-based image synthesis where camera geometry is neglected. It introduces Curved Diffusion, a framework that injects arbitrary curved rendering geometry into a text-to-image diffusion model via per-pixel coordinate conditioning and, for broader surfaces, metric tensor conditioning. A self-attention reweighting scheme based on local warp density is proposed to maintain fidelity in warped regions. The approach enables controllable generation of lenses, photospheres, and spherical textures with a single model, and is supported by quantitative distortion fidelity metrics and human evaluations that attest to geometry-aware improvements. Together, these contributions broaden the practical applicability of diffusion models to VR, immersive visuals, and geometry-consistent texture synthesis.
Abstract
State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image capture. The influence of different optical systems on the final scene appearance is frequently overlooked. This study introduces a framework that intimately integrates a text-to-image diffusion model with the particular lens geometry used in image rendering. Our method is based on a per-pixel coordinate conditioning method, enabling the control over the rendering geometry. Notably, we demonstrate the manipulation of curvature properties, achieving diverse visual effects, such as fish-eye, panoramic views, and spherical texturing using a single diffusion model.
