Table of Contents
Fetching ...

Cool-chic video: Learned video coding with 800 parameters

Thomas Leguay, Théo Ladune, Pierrick Philippe, Olivier Déforges

TL;DR

This work targets low-complexity learned video compression by building on the Cool-chic image codec and adding an inter-frame coding module to exploit temporal redundancies. The proposed approach achieves a decoder with roughly 900 multiplications per decoded pixel and about 800 parameters per frame, enabling frame-wise encoding suitable for both low-delay and random-access configurations. RD performance is reported to be near AVC and better than previous overfitted codecs like FFNeRV, while maintaining very low decoding complexity and an open-source release for further research. The study also highlights current limitations in motion estimation, high-rate performance, and encoding time, outlining concrete directions for improving practical deployment of learned Video codecs.

Abstract

We propose a lightweight learned video codec with 900 multiplications per decoded pixel and 800 parameters overall. To the best of our knowledge, this is one of the neural video codecs with the lowest decoding complexity. It is built upon the overfitted image codec Cool-chic and supplements it with an inter coding module to leverage the video's temporal redundancies. The proposed model is able to compress videos using both low-delay and random access configurations and achieves rate-distortion close to AVC while out-performing other overfitted codecs such as FFNeRV. The system is made open-source: orange-opensource.github.io/Cool-Chic.

Cool-chic video: Learned video coding with 800 parameters

TL;DR

This work targets low-complexity learned video compression by building on the Cool-chic image codec and adding an inter-frame coding module to exploit temporal redundancies. The proposed approach achieves a decoder with roughly 900 multiplications per decoded pixel and about 800 parameters per frame, enabling frame-wise encoding suitable for both low-delay and random-access configurations. RD performance is reported to be near AVC and better than previous overfitted codecs like FFNeRV, while maintaining very low decoding complexity and an open-source release for further research. The study also highlights current limitations in motion estimation, high-rate performance, and encoding time, outlining concrete directions for improving practical deployment of learned Video codecs.

Abstract

We propose a lightweight learned video codec with 900 multiplications per decoded pixel and 800 parameters overall. To the best of our knowledge, this is one of the neural video codecs with the lowest decoding complexity. It is built upon the overfitted image codec Cool-chic and supplements it with an inter coding module to leverage the video's temporal redundancies. The proposed model is able to compress videos using both low-delay and random access configurations and achieves rate-distortion close to AVC while out-performing other overfitted codecs such as FFNeRV. The system is made open-source: orange-opensource.github.io/Cool-Chic.
Paper Structure (33 sections, 4 equations, 6 figures, 3 tables)

This paper contains 33 sections, 4 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Decoding of a video frame.
  • Figure 2: Bidirectional weighted motion compensation. In the prediction weighting $\bm{\beta}$, black corresponds to 0 and white to 1.
  • Figure 3: Decoding a B-frame using the inter coding module
  • Figure 4: Rate-distortion performances on CLIC 2024 dataset. PSNR is computed in the YUV420 domain.
  • Figure 5: Rate-distortion results on 3 CLIC 2024 videos in Random Access configuration.
  • ...and 1 more figures