Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
Shanzhi Yin, Zihan Zhang, Bolin Chen, Shiqi Wang, Yan Ye
TL;DR
The paper introduces Dynamics-Codec, a motion-pattern-prior-based generative video compression framework that replaces content priors with compact motion tokens. It pairs a dense-to-sparse optical-flow tokenizer with a flow-driven diffusion generator (via Stable Video Diffusion and MOFA) to reconstruct inter-frame dynamics from key-frame data at ultra-low bitrates. Empirical results show substantial BD-rate improvements over VVC and ECM on both motion-dynamics and talking-face datasets, along with favorable subjective quality, at the cost of higher decoding latency due to diffusion. The work demonstrates strong generalization across diverse scenes by leveraging motion priors and pre-trained diffusion generations, suggesting practical viability for scene-dynamics-centric video coding.
Abstract
This paper proposes a novel generative video compression framework that leverages motion pattern priors, derived from subtle dynamics in common scenes (e.g., swaying flowers or a boat drifting on water), rather than relying on video content priors (e.g., talking faces or human bodies). These compact motion priors enable a new approach to ultra-low bitrate communication while achieving high-quality reconstruction across diverse scene contents. At the encoder side, motion priors can be streamlined into compact representations via a dense-to-sparse transformation. At the decoder side, these priors facilitate the reconstruction of scene dynamics using an advanced flow-driven diffusion model. Experimental results illustrate that the proposed method can achieve superior rate-distortion-performance and outperform the state-of-the-art conventional-video codec Enhanced Compression Model (ECM) on-scene dynamics sequences. The project page can be found at-https://github.com/xyzysz/GNVDC.
