I2V3D: Controllable image-to-video generation with 3D guidance
Zhiyuan Zhang, Dongdong Chen, Jing Liao
TL;DR
I2V3D marries traditional computer graphics with diffusion-based synthesis to enable precise 3D control in image-to-video generation from a single image. It introduces a 3D reconstruction and rendering stage, a 3D-guided two-stage video generation pipeline (keyframe generation with LoRA multi-view augmentation and geometric guidance, followed by training-free 3D-guided interpolation), and extensive ablations demonstrating temporal coherence and 3D controllability gains. The approach supports arbitrary starting frames, extended sequences, and 3D scene editing (add/copy/replace/edit objects) while achieving superior quantitative and qualitative results against strong baselines. This framework lowers the professional threshold for CG-quality video creation and offers a flexible path from static imagery to controllable, photorealistic animations.
Abstract
We present I2V3D, a novel framework for animating static images into dynamic videos with precise 3D control, leveraging the strengths of both 3D geometry guidance and advanced generative models. Our approach combines the precision of a computer graphics pipeline, enabling accurate control over elements such as camera movement, object rotation, and character animation, with the visual fidelity of generative AI to produce high-quality videos from coarsely rendered inputs. To support animations with any initial start point and extended sequences, we adopt a two-stage generation process guided by 3D geometry: 1) 3D-Guided Keyframe Generation, where a customized image diffusion model refines rendered keyframes to ensure consistency and quality, and 2) 3D-Guided Video Interpolation, a training-free approach that generates smooth, high-quality video frames between keyframes using bidirectional guidance. Experimental results highlight the effectiveness of our framework in producing controllable, high-quality animations from single input images by harmonizing 3D geometry with generative models. The code for our framework will be publicly released.
