PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Yipeng Chen; Zhichao Ye; Zhenzhou Fang; Xinyu Chen; Xiaoyu Zhang; Jialing Liu; Nan Wang; Haomin Liu; Guofeng Zhang

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

Yipeng Chen, Zhichao Ye, Zhenzhou Fang, Xinyu Chen, Xiaoyu Zhang, Jialing Liu, Nan Wang, Haomin Liu, Guofeng Zhang

TL;DR

PostCam tackles the problem of post-capture camera-trajectory editing for dynamic scenes by introducing a query-shared cross-attention mechanism that jointly ingests 6-DoF camera poses and rendered video into a shared conditioning space. A two-stage training regime first learns motion from pose cues and then refines motion and appearance with rendered visual information, enabling precise pose control and high-fidelity generation. Across real and synthetic datasets, PostCam outperforms state-of-the-art methods by over 20% in camera-control precision and view consistency, while delivering top-tier video quality. The approach promises robust, editable viewpoint generation for dynamic scenes and will release code and data to support future research.

Abstract

We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suffer from suboptimal camera motion injection strategies; such suboptimal designs not only limit camera control precision but also result in generated videos that fail to preserve fine visual details from the source video. To achieve more accurate and flexible motion manipulation, PostCam introduces a query-shared cross-attention module. It integrates two distinct forms of control signals: the 6-DoF camera poses and the 2D rendered video frames. By fusing them into a unified representation within a shared feature space, our model can extract underlying motion cues, which enhances both control precision and generation quality. Furthermore, we adopt a two-stage training strategy: the model first learns coarse camera control from pose inputs, and then incorporates visual information to refine motion accuracy and enhance visual fidelity. Experiments on both real-world and synthetic datasets demonstrate that PostCam outperforms state-of-the-art methods by over 20% in camera control precision and view consistency, while achieving the highest video generation quality. Our project webpage is publicly available at: https://cccqaq.github.io/PostCam.github.io/

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

TL;DR

Abstract

PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)