CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, Peng Wang
TL;DR
This work addresses 360-degree panorama outpainting from a single camera-free image when camera intrinsics are unknown. It introduces CamFreeDiff, a diffusion-based pipeline that jointly learns a $3$-DOF homography $(f,\phi,\psi)$ mapping the input view to a predefined canonical view, thereby establishing pixel-level correspondences for eight target views. A frozen Stable Diffusion encoder supports text-guided generation, while an MLP-based homography estimator enables end-to-end differentiable integration with a correspondence-aware attention mechanism across views. Experiments on Matterport3D and the out-of-domain Structured3D demonstrate strong robustness to camera-free inputs and superior generalization, with the new-view variant delivering the best quality.
Abstract
This paper introduces Camera-free Diffusion (CamFreeDiff) model for 360-degree image outpainting from a single camera-free image and text description. This method distinguishes itself from existing strategies, such as MVDiffusion, by eliminating the requirement for predefined camera poses. Instead, our model incorporates a mechanism for predicting homography directly within the multi-view diffusion framework. The core of our approach is to formulate camera estimation by predicting the homography transformation from the input view to a predefined canonical view. The homography provides point-level correspondences between the input image and targeting panoramic images, allowing connections enforced by correspondence-aware attention in a fully differentiable manner. Qualitative and quantitative experimental results demonstrate our model's strong robustness and generalization ability for 360-degree image outpainting in the challenging context of camera-free inputs.
