Optimal Video Compression using Pixel Shift Tracking
Hitesh Saai Mananchery Panneerselvam, Smit Anand
TL;DR
This paper tackles video storage efficiency by removing cross-frame redundancy through Pixel Shift Tracking (R2S), which identifies redundant pixels via pixel displacements and stores only novel pixels along with the shift history ${\lambda}$. The approach combines pixel-point tracking (PIPs/PIPs++) with optional enhancements such as a Similarity Search Matrix and Mask R-CNN-based object masking to detect and omit redundant content across frames. It demonstrates substantial storage savings (approximately 80–90%) with modest data loss (a few percent), reconstructing frames by propagating from the initial frame using the stored shift data. The method is codec-agnostic and adaptable for integration with existing ML-based compression pipelines, offering a practical route to lower storage and bandwidth requirements in streaming and storage-constrained scenarios.
Abstract
The Video comprises approximately ~85\% of all internet traffic, but video encoding/compression is being historically done with hard coded rules, which has worked well but only to a certain limit. We have seen a surge in video compression algorithms using ML-based models in the last few years and many of them have outperformed several legacy codecs. The models range from encoding video end to end using an ML approach or replacing some intermediate steps in legacy codecs using ML models to increase the efficiency of those steps. Optimizing video storage is an essential aspect of video processing, so we are proposing one of the possible approaches to achieve it is by avoiding redundant data at each frame. In this paper, we want to introduce the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression. We call this method Redundancy Removal using Shift (R\textsuperscript2S). This method can be utilized across various Machine Learning model algorithms, and make the compression more accessible and adaptable. In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage.
