Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding
Kaiting Liu, Hazel Doughty
TL;DR
The paper tackles the rigidity of fixed taxonomies in video understanding by introducing category splitting, a zero-shot editing framework that refines coarse labels into fine-grained subcategories while preserving existing predictions. It leverages latent compositional structure in video classifiers through modifier retrieval and alignment to edit only the classification head, with low-shot finetuning providing additional gains. The proposed SSv2-Split and FineGym-Split benchmarks demonstrate that zero-shot and low-shot edits substantially outperform vision-language baselines in generality while maintaining near-perfect locality. This work highlights the latent compositionality in video backbones and offers a practical, data-efficient path to adapting taxonomy granularity in specialized domains.
Abstract
Video recognition models are typically trained on fixed taxonomies which are often too coarse, collapsing distinctions in object, manner or outcome under a single label. As tasks and definitions evolve, such models cannot accommodate emerging distinctions and collecting new annotations and retraining to accommodate such changes is costly. To address these challenges, we introduce category splitting, a new task where an existing classifier is edited to refine a coarse category into finer subcategories, while preserving accuracy elsewhere. We propose a zero-shot editing method that leverages the latent compositional structure of video classifiers to expose fine-grained distinctions without additional data. We further show that low-shot fine-tuning, while simple, is highly effective and benefits from our zero-shot initialization. Experiments on our new video benchmarks for category splitting demonstrate that our method substantially outperforms vision-language baselines, improving accuracy on the newly split categories without sacrificing performance on the rest. Project page: https://kaitingliu.github.io/Category-Splitting/.
