Table of Contents
Fetching ...

Optimal Video Compression using Pixel Shift Tracking

Hitesh Saai Mananchery Panneerselvam, Smit Anand

TL;DR

This paper tackles video storage efficiency by removing cross-frame redundancy through Pixel Shift Tracking (R2S), which identifies redundant pixels via pixel displacements and stores only novel pixels along with the shift history ${\lambda}$. The approach combines pixel-point tracking (PIPs/PIPs++) with optional enhancements such as a Similarity Search Matrix and Mask R-CNN-based object masking to detect and omit redundant content across frames. It demonstrates substantial storage savings (approximately 80–90%) with modest data loss (a few percent), reconstructing frames by propagating from the initial frame using the stored shift data. The method is codec-agnostic and adaptable for integration with existing ML-based compression pipelines, offering a practical route to lower storage and bandwidth requirements in streaming and storage-constrained scenarios.

Abstract

The Video comprises approximately ~85\% of all internet traffic, but video encoding/compression is being historically done with hard coded rules, which has worked well but only to a certain limit. We have seen a surge in video compression algorithms using ML-based models in the last few years and many of them have outperformed several legacy codecs. The models range from encoding video end to end using an ML approach or replacing some intermediate steps in legacy codecs using ML models to increase the efficiency of those steps. Optimizing video storage is an essential aspect of video processing, so we are proposing one of the possible approaches to achieve it is by avoiding redundant data at each frame. In this paper, we want to introduce the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression. We call this method Redundancy Removal using Shift (R\textsuperscript2S). This method can be utilized across various Machine Learning model algorithms, and make the compression more accessible and adaptable. In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage.

Optimal Video Compression using Pixel Shift Tracking

TL;DR

This paper tackles video storage efficiency by removing cross-frame redundancy through Pixel Shift Tracking (R2S), which identifies redundant pixels via pixel displacements and stores only novel pixels along with the shift history . The approach combines pixel-point tracking (PIPs/PIPs++) with optional enhancements such as a Similarity Search Matrix and Mask R-CNN-based object masking to detect and omit redundant content across frames. It demonstrates substantial storage savings (approximately 80–90%) with modest data loss (a few percent), reconstructing frames by propagating from the initial frame using the stored shift data. The method is codec-agnostic and adaptable for integration with existing ML-based compression pipelines, offering a practical route to lower storage and bandwidth requirements in streaming and storage-constrained scenarios.

Abstract

The Video comprises approximately ~85\% of all internet traffic, but video encoding/compression is being historically done with hard coded rules, which has worked well but only to a certain limit. We have seen a surge in video compression algorithms using ML-based models in the last few years and many of them have outperformed several legacy codecs. The models range from encoding video end to end using an ML approach or replacing some intermediate steps in legacy codecs using ML models to increase the efficiency of those steps. Optimizing video storage is an essential aspect of video processing, so we are proposing one of the possible approaches to achieve it is by avoiding redundant data at each frame. In this paper, we want to introduce the approach of redundancies removal in subsequent frames for a given video as a main approach for video compression. We call this method Redundancy Removal using Shift (R\textsuperscript2S). This method can be utilized across various Machine Learning model algorithms, and make the compression more accessible and adaptable. In this study, we have utilized a computer vision-based pixel point tracking method to identify redundant pixels to encode video for optimal storage.
Paper Structure (16 sections, 8 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 8 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: This picture shows the movement of frames/block. The grey area is the redundant part which doesn't need to be stored anymore
  • Figure 2: Representation of Frame compression using pixel point tracking
  • Figure 3: Picture (a) shows the compression percentage (size of compressed image w.r.t the original size of image) w.r.t the number of frames for which the model is predicting the shift. Picture (b) shows the data loss w.r.t the number of frames for which the model is predicting the shift. We see a big difference in the size reduction after compression between pips and pips++ but we that data loss in pip++ was high initially, it went down for few frames and then it went up at the end compared to pips. Its because, in our experiment, pips++ performed better when predicting the shift in the pixels. These values will vary based on the model and video of choice and point of tracking method. This graph is created by running pips model with single point of tracking on a 1920*1080 video using 30 frames at 30fps
  • Figure 4: Data Retrieval Procedure for decompression per frame