Evaluation of NVENC Split-Frame Encoding (SFE) for UHD Video Transcoding
Kasidis Arunruangsirilert, Jiro Katto
TL;DR
This work assesses NVIDIA Split-Frame Encoding (SFE) for UHD transcoding by parallelizing frame encoding across multiple NVENC chips and stitching results. Using ITE Series A 4K/8K test sequences, the study quantifies RD performance, encoding throughput, power consumption, and end-to-end latency across codecs (HEVC and AV1), presets, and tunings. The findings show that SFE delivers major throughput gains with only minor RD penalties in typical real-time configurations, enabling higher-quality 4K presets and real-time 8K encoding, while maintaining or reducing latency and improving energy efficiency compared to software encoding. These results position SFE as a practical enabler for high-throughput, real-time UHD transcoding in data centers and edge environments, with further work needed on subjective QoE and interactions with advanced encoding features.
Abstract
NVIDIA Encoder (NVENC) features in modern NVIDIA GPUs, offer significant advantages over software encoders by providing comparable Rate-Distortion (RD) performance while consuming considerably less power. The increasing capability of consumer devices to capture footage in Ultra High-Definition (UHD) at 4K and 8K resolutions necessitates high-performance video transcoders for internet-based delivery. To address this demand, NVIDIA introduced Split-Frame Encoding (SFE), a technique that leverages multiple on-die NVENC chips available in high-end GPUs. SFE splits a single UHD frame for parallel encoding across these physical encoders and subsequently stitches the results, which significantly improves encoding throughput. However, this approach is known to incur an RD performance penalty. The widespread adoption of NVIDIA GPUs in data centers, driven by the rise of Generative AI, means NVENC is poised to play a critical role in transcoding UHD video. To better understand the performance-efficiency tradeoff of SFE, this paper evaluates SFE's impact on RD performance, encoding throughput, power consumption, and end-to-end latency using standardized test sequences. The results show that for real-time applications, SFE nearly doubles encoding throughput with a negligible RD performance penalty, which enables the use of higher-quality presets for 4K and makes real-time 8K encoding feasible, effectively offsetting the minor RD penalty. Moreover, SFE adds no latency at 4K and can reduce it at 8K, positioning it as a key enabler for high-throughput, real-time UHD transcoding.
