Table of Contents
Fetching ...

SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation

Guiyu Zhang, Yabo Chen, Xunzhi Xiang, Junchao Huang, Zhongyu Wang, Li Jiang

Abstract

Controlling both camera motion and object dynamics is essential for coherent and expressive video generation, yet current methods typically handle only one motion type or rely on ambiguous 2D cues that entangle camera-induced parallax with true object movement. We present SymphoMotion, a unified motion-control framework that jointly governs camera trajectories and object dynamics within a single model. SymphoMotion features a Camera Trajectory Control mechanism that integrates explicit camera paths with geometry-aware cues to ensure stable, structurally consistent viewpoint transitions, and an Object Dynamics Control mechanism that combines 2D visual guidance with 3D trajectory embeddings to enable depth-aware, spatially coherent object manipulation. To support large-scale training and evaluation, we further construct RealCOD-25K, a comprehensive real-world dataset containing paired camera poses and object-level 3D trajectories across diverse indoor and outdoor scenes, addressing a key data gap in unified motion control. Extensive experiments and user studies show that SymphoMotion significantly outperforms existing methods in visual fidelity, camera controllability, and object-motion accuracy, establishing a new benchmark for unified motion control in video generation.Codes and data are publicly available at https://grenoble-zhang.github.io/SymphoMotion/.

SymphoMotion: Joint Control of Camera Motion and Object Dynamics for Coherent Video Generation

Abstract

Controlling both camera motion and object dynamics is essential for coherent and expressive video generation, yet current methods typically handle only one motion type or rely on ambiguous 2D cues that entangle camera-induced parallax with true object movement. We present SymphoMotion, a unified motion-control framework that jointly governs camera trajectories and object dynamics within a single model. SymphoMotion features a Camera Trajectory Control mechanism that integrates explicit camera paths with geometry-aware cues to ensure stable, structurally consistent viewpoint transitions, and an Object Dynamics Control mechanism that combines 2D visual guidance with 3D trajectory embeddings to enable depth-aware, spatially coherent object manipulation. To support large-scale training and evaluation, we further construct RealCOD-25K, a comprehensive real-world dataset containing paired camera poses and object-level 3D trajectories across diverse indoor and outdoor scenes, addressing a key data gap in unified motion control. Extensive experiments and user studies show that SymphoMotion significantly outperforms existing methods in visual fidelity, camera controllability, and object-motion accuracy, establishing a new benchmark for unified motion control in video generation.Codes and data are publicly available at https://grenoble-zhang.github.io/SymphoMotion/.

Paper Structure

This paper contains 23 sections, 7 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of SymphoMotion. Built on Wan-I2V wang2025wan, SymphoMotion introduces two complementary mechanisms for simultaneous control of camera and object motion: Camera Trajectory Control (CTC) and Object Dynamics Control (ODC). Given a reference image, a text prompt, and the specified camera and object trajectories, CTC employs the Viewpoint Control Module (VCM) to integrate 3D geometric priors with camera motion for precise camera trajectory control. In parallel, ODC, powered by the Object Motion Module (OMM), combines 2D visual guidance with 3D motion cues to achieve dynamic and spatially coherent object manipulation.
  • Figure 2: Inference pipeline of SymphoMotion. Users can specify camera motion and interactively draw 3D trajectories of selected objects through our interface, and the system generates videos that align with the user-defined camera and object motion.
  • Figure 3: RealCOD-25K dataset construction pipeline.
  • Figure 4: Independent camera motion control.
  • Figure 5: Simultaneous control over camera and object motions. MotionCtrl struggles to generate realistic object dynamics, causing objects to disappear from view, whereas SymphoMotion achieves high-quality simultaneous control.
  • ...and 6 more figures