PNVC: Towards Practical INR-based Video Compression
Ge Gao, Ho Man Kwan, Fan Zhang, David Bull
TL;DR
PNVC tackles the practical challenge of deploying INR-based video codecs by blending autoencoder-based compression with per-frame overfitting in a pretrain-then-overfit framework. It introduces a reparameterized ModMixer backbone, a hierarchical quality and entropy model, and scale-aware hierarchical positional encoding to enable LD and RA operation with competitive rate-distortion performance and fast decoding. The approach achieves substantial BD-rate savings against HEVC HM 18.0 and compares favorably with HiNeRV and VTM 20.0 under LD/RA while maintaining 20+ FPS decoding at 1080p, marking a meaningful advance toward real-world INR-based video coding. The work emphasizes practical considerations—latency constraints, per-frame adaptation, and decoding efficiency—paving the way for broader adoption of INR-based video codecs in streaming and conferencing scenarios.
Abstract
Neural video compression has recently demonstrated significant potential to compete with conventional video codecs in terms of rate-quality performance. These learned video codecs are however associated with various issues related to decoding complexity (for autoencoder-based methods) and/or system delays (for implicit neural representation (INR) based models), which currently prevent them from being deployed in practical applications. In this paper, targeting a practical neural video codec, we propose a novel INR-based coding framework, PNVC, which innovatively combines autoencoder-based and overfitted solutions. Our approach benefits from several design innovations, including a new structural reparameterization-based architecture, hierarchical quality control, modulation-based entropy modeling, and scale-aware positional embedding. Supporting both low delay (LD) and random access (RA) configurations, PNVC outperforms existing INR-based codecs, achieving nearly 35%+ BD-rate savings against HEVC HM 18.0 (LD) - almost 10% more compared to one of the state-of-the-art INR-based codecs, HiNeRV and 5% more over VTM 20.0 (LD), while maintaining 20+ FPS decoding speeds for 1080p content. This represents an important step forward for INR-based video coding, moving it towards practical deployment. The source code will be available for public evaluation.
