Scalable Frame Sampling for Video Classification: A Semi-Optimal Policy Approach with Reduced Search Space
Junho Lee, Jeongwoo Shin, Seung Woo Ko, Seongsu Ha, Joonseok Lee
TL;DR
The paper tackles scalable frame sampling for video classification by introducing a semi-optimal policy that reduces the search space from $O(T^N)$ to $O(T)$ under a frame-independence assumption observed in practical frame rates. It presents SOSampler, which learns this policy by distilling per-frame classifier confidence through a pairwise ranking loss and a label-guidance loss, enabling effective selection of $N$ frames from $T$ candidate frames. Extensive experiments across ActivityNet-v1.3, Mini-Kinetics, Mini-Sports1M, and COIN with CNN and Transformer backbones show that the semi-optimal approach yields stable, high performance for both small and large values of $N$ and $T$, often outperforming methods that search the full combinatorial space. The proposed method also demonstrates improved computational efficiency, achieving higher throughput with lower GFLOPs. Overall, the work shifts the focus from exploring large search spaces to exploiting a principled independence-based scoring to achieve scalable, accurate video classification.
Abstract
Given a video with $T$ frames, frame sampling is a task to select $N \ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $\binom{T}{N}$, especially when $N$ gets large. To address this challenge, we introduce a novel perspective of reducing the search space from $O(T^N)$ to $O(T)$. Instead of exploring the entire $O(T^N)$ space, our proposed semi-optimal policy selects the top $N$ frames based on the independently estimated value of each frame using per-frame confidence, significantly reducing the computational complexity. We verify that our semi-optimal policy can efficiently approximate the optimal policy, particularly under practical settings. Additionally, through extensive experiments on various datasets and model architectures, we demonstrate that learning our semi-optimal policy ensures stable and high performance regardless of the size of $N$ and $T$.
