Evaluation of GPU Video Encoder for Low-Latency Real-Time 4K UHD Encoding
Kasidis Arunruangsirilert, Jiro Katto
TL;DR
This work assesses GPU-accelerated video encoders from NVIDIA, Intel, and AMD under Low-Latency and Ultra Low-Latency modes for 4K UHD real-time encoding. By evaluating rate-distortion performance (PSNR and VMAF) and end-to-end latency across H.264/AVC, HEVC, and AV1 against software encoders, it demonstrates that hardware encoders substantially reduce latency with only modest RD trade-offs. Ultra Low-Latency tuning can drive end-to-end delays to as low as 83 ms (5 frames) without material RD penalties, while quality presets minimally affect latency. The findings guide deployment choices for latency-critical applications and highlight Intel as strong for HEVC RD and latency, NVIDIA for balanced RD/latency, and AMD for stable latency with lower RD; future work expands datasets, network scenarios, power metrics, and subjective assessments.
Abstract
The demand for high-quality, real-time video streaming has grown exponentially, with 4K Ultra High Definition (UHD) becoming the new standard for many applications such as live broadcasting, TV services, and interactive cloud gaming. This trend has driven the integration of dedicated hardware encoders into modern Graphics Processing Units (GPUs). Nowadays, these encoders support advanced codecs like HEVC and AV1 and feature specialized Low-Latency and Ultra Low-Latency tuning, targeting end-to-end latencies of < 2 seconds and < 500 ms, respectively. As the demand for such capabilities grows toward the 6G era, a clear understanding of their performance implications is essential. In this work, we evaluate the low-latency encoding modes on GPUs from NVIDIA, Intel, and AMD from both Rate-Distortion (RD) performance and latency perspectives. The results are then compared against both the normal-latency tuning of hardware encoders and leading software encoders. Results show hardware encoders achieve significantly lower E2E latency than software solutions with slightly better RD performance. While standard Low-Latency tuning yields a poor quality-latency trade-off, the Ultra Low-Latency mode reduces E2E latency to 83 ms (5 frames) without additional RD impact. Furthermore, hardware encoder latency is largely insensitive to quality presets, enabling high-quality, low-latency streams without compromise.
