UniComp: Rethinking Video Compression Through Informational Uniqueness
Chao Yuan, Shimin Chen, Minliang Lin, Limeng Qiao, Guanglu Wan, Lin Ma
TL;DR
UniComp reframes video compression around information uniqueness rather than attention, formulating the problem as minimizing conditional entropy H(X|S) and deriving a reconstruction-error bound linked to token uniqueness. The framework integrates Frame Group Fusion, Token Allocation, and Spatial Dynamic Compression to adaptively reduce temporal, global, and spatial redundancy while preserving semantically unique content. Theoretical bounds and extensive long-video experiments demonstrate that UniComp outperforms state-of-the-art, with strong robustness across backbones and frame-length scales and improved efficiency. This approach offers a practical, plug-and-play solution for scalable multimodal video understanding on long sequences.
Abstract
Distinct from attention-based compression methods, this paper presents an information uniqueness driven video compression framework, termed UniComp, which aims to maximize the information fidelity of video representations under constrained computational budgets. Starting from the information-theoretic perspective, we formulate the vision compression as an optimization problem that minimizes conditional entropy (reconstruction error) between retained and full tokens. To achieve this, we introduce the notion of information uniqueness to measure intrinsic redundancy among tokens to link with reconstruction error. Based on uniqueness, we design three modules-Frame Group Fusion, Token Allocation, and Spatial Dynamic Compression-that progressively perform semantic frame grouping, adaptive resource allocation, and fine-grained spatial compression. Extensive experiments demonstrate that UniComp consistently outperforms existing compression methods in preserving essential visual tokens under limited computational budgets, highlighting the pivotal role of information uniqueness in token compression efficacy.
