Table of Contents
Fetching ...

PNVC: Towards Practical INR-based Video Compression

Ge Gao, Ho Man Kwan, Fan Zhang, David Bull

TL;DR

PNVC tackles the practical challenge of deploying INR-based video codecs by blending autoencoder-based compression with per-frame overfitting in a pretrain-then-overfit framework. It introduces a reparameterized ModMixer backbone, a hierarchical quality and entropy model, and scale-aware hierarchical positional encoding to enable LD and RA operation with competitive rate-distortion performance and fast decoding. The approach achieves substantial BD-rate savings against HEVC HM 18.0 and compares favorably with HiNeRV and VTM 20.0 under LD/RA while maintaining 20+ FPS decoding at 1080p, marking a meaningful advance toward real-world INR-based video coding. The work emphasizes practical considerations—latency constraints, per-frame adaptation, and decoding efficiency—paving the way for broader adoption of INR-based video codecs in streaming and conferencing scenarios.

Abstract

Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.

PNVC: Towards Practical INR-based Video Compression

TL;DR

PNVC tackles the practical challenge of deploying INR-based video codecs by blending autoencoder-based compression with per-frame overfitting in a pretrain-then-overfit framework. It introduces a reparameterized ModMixer backbone, a hierarchical quality and entropy model, and scale-aware hierarchical positional encoding to enable LD and RA operation with competitive rate-distortion performance and fast decoding. The approach achieves substantial BD-rate savings against HEVC HM 18.0 and compares favorably with HiNeRV and VTM 20.0 under LD/RA while maintaining 20+ FPS decoding at 1080p, marking a meaningful advance toward real-world INR-based video coding. The work emphasizes practical considerations—latency constraints, per-frame adaptation, and decoding efficiency—paving the way for broader adoption of INR-based video codecs in streaming and conferencing scenarios.

Abstract

Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.
Paper Structure (28 sections, 7 equations, 5 figures, 2 tables)

This paper contains 28 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Radar plots illustrating the performance of proposed PNVC codec (ours) and nine other conventional and neural video codecs, in terms of coding efficiency (BD-rate measured by PSNR and MS-SSIM on UVG and MCL-JVC datasets, against HM 18.0, LD), decoding speeds (FPS) and coding latency\ref{['fn:latency']} (frames). It can be observed that PNVC demonstrates excellent performance in all these aspects.
  • Figure 2: Illustration of the proposed PNVC framework.
  • Figure 3: The architectures of the reparamterized ModMixer block during training and inference.
  • Figure 4: RD performance comparison on UVG and MCL-JCV dataset, where the two best performers for each type is plotted.
  • Figure 5: Visual quality comparison between HiNeRV and the proposed PNVC reconstructed content at similar bitrates.