A Large-Scale Study on Video Action Dataset Condensation
Yang Chen, Sheng Guo, Bo Zheng, Limin Wang
TL;DR
The paper tackles the problem of condensing large-scale video action datasets by extending three representative condensation approaches to the space-time domain, alongside a unified evaluation protocol. It introduces temporal processing with sliding-window sampling and analyzes labeling, augmentation, and loss choices, revealing that labeling methods often dominate performance while temporal design shapes consistency and efficiency. The study presents comprehensive ablations across four action datasets (HMDB51, UCF101, SSv2, K400), showing that dataset distillation methods excel on harder datasets while sample selection can perform well on easier ones, and achieves state-of-the-art results under the proposed protocol. The work enables data-efficient video action recognition at scale and provides practical guidance on algorithm choice and evaluation for future video condensation research.
Abstract
Recently, dataset condensation has made significant progress in the image domain. Unlike images, videos possess an additional temporal dimension, which harbors considerable redundant information, making condensation even more crucial. However, video dataset condensation still remains an underexplored area. We aim to bridge this gap by providing a large-scale study with systematic design and fair comparison. Specifically, our work delves into three key aspects to provide valuable empirical insights: (1) temporal processing of video data, (2) the evaluation protocol for video dataset condensation, and (3) adaptation of condensation algorithms to the space-time domain. From this study, we derive several intriguing observations: (i) labeling methods greatly influence condensation performance, (ii) simple sliding-window sampling is effective for temporal processing, and (iii) dataset distillation methods perform better in challenging scenarios, while sample selection methods excel in easier ones. Furthermore, we propose a unified evaluation protocol for the fair comparison of different condensation algorithms and achieve state-of-the-art results on four widely-used action recognition datasets: HMDB51, UCF101, SSv2 and K400. Our code is available at https://github.com/MCG-NJU/Video-DC.
