Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang, Yue Xu, Cewu Lu, Yong-Lu Li
TL;DR
This paper addresses the challenge of video dataset distillation by systematically studying temporal condensation and proposing a taxonomy across four factors: the number of synthetic frames $N_{syn}$, the number of real frames $N_{real}$, the number of segments $K$, and the interpolation algorithm $\mathcal{I}$. It reveals that temporal information is often underutilized in distillation and that dense temporal data yields diminishing returns, motivating a static-dynamic disentanglement: first distill static memory from still frames, then compensate motion with a learnable dynamic memory block $\mathcal{H}$. The authors demonstrate state-of-the-art performance on multiple video benchmarks while using substantially reduced storage (often under $50\%$ of the baseline), and show that their approach generalizes across architectures and scales. This work offers a practical route to memory-efficient video distillation and provides a foundation for further exploration of temporal condensation strategies in large-scale video datasets.
Abstract
Recently, dataset distillation has paved the way towards efficient machine learning, especially for image datasets. However, the distillation for videos, characterized by an exclusive temporal dimension, remains an underexplored domain. In this work, we provide the first systematic study of video distillation and introduce a taxonomy to categorize temporal compression. Our investigation reveals that the temporal information is usually not well learned during distillation, and the temporal dimension of synthetic data contributes little. The observations motivate our unified framework of disentangling the dynamic and static information in the videos. It first distills the videos into still images as static memory and then compensates the dynamic and motion information with a learnable dynamic memory block. Our method achieves state-of-the-art on video datasets at different scales, with a notably smaller memory storage budget. Our code is available at https://github.com/yuz1wan/video_distillation.
