Standard compliant video coding using low complexity, switchable neural wrappers
Yueyu Hu, Chenhao Zhang, Onur G. Guleryuz, Debargha Mukherjee, Yao Wang
TL;DR
The paper tackles the bottleneck of deploying neural video codecs by proposing a standard-compatible framework that wraps a conventional codec with switchable neural pre- and post-processors. It introduces a low-complexity neural post-processor (516 MACs/pixel) and jointly optimizes it with a neural pre-processor using a differentiable codec proxy to enforce rate constraints, signaling the optimal downsampling ratio $r$ per sequence. Empirical results on UVG and AOM CTC show BD-rate reductions up to $-22.6\%$ over HEVC and $-9.3\%$ over VVC, with decoding times suitable for consumer hardware (e.g., $7.7$ ms per 1080p frame on a mid-range GPU). The approach demonstrates practical gains with minimal added decoding complexity, suggesting a viable path for practical neural tools in next-generation standards. All mathematical relationships are expressed with appropriate $...$ delimiters to ensure precise interpretation.
Abstract
The proliferation of high resolution videos posts great storage and bandwidth pressure on cloud video services, driving the development of next-generation video codecs. Despite great progress made in neural video coding, existing approaches are still far from economical deployment considering the complexity and rate-distortion performance tradeoff. To clear the roadblocks for neural video coding, in this paper we propose a new framework featuring standard compatibility, high performance, and low decoding complexity. We employ a set of jointly optimized neural pre- and post-processors, wrapping a standard video codec, to encode videos at different resolutions. The rate-distorion optimal downsampling ratio is signaled to the decoder at the per-sequence level for each target rate. We design a low complexity neural post-processor architecture that can handle different upsampling ratios. The change of resolution exploits the spatial redundancy in high-resolution videos, while the neural wrapper further achieves rate-distortion performance improvement through end-to-end optimization with a codec proxy. Our light-weight post-processor architecture has a complexity of 516 MACs / pixel, and achieves 9.3% BD-Rate reduction over VVC on the UVG dataset, and 6.4% on AOM CTC Class A1. Our approach has the potential to further advance the performance of the latest video coding standards using neural processing with minimal added complexity.
