Table of Contents
Fetching ...

ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

Changhong Li, Clément Bled, Rosa Fernandez, Shreejith Shanker

TL;DR

ReTiDe tackles the challenge of energy-efficient, real-time denoising for high-resolution video by offloading inference to cloud-based FPGA DPUs. It introduces a lightweight, quantised denoiser (ReTiDe-Net) and an end-to-end client–server framework that integrates with NUKE, enabling seamless workflow adoption. Through $PTQ$ with $QAT$ to $INT8$ and deployment on AMD $DPU$-based FPGAs, it achieves up to $3{,}746.09$ GOPS throughput and $203.59$ GOPS/W energy efficiency, while maintaining PSNR/SSIM on par with FP32 baselines for both colour and grayscale denoising. This work demonstrates that targeted hardware acceleration can provide practical, scalable denoising for encoding pipelines and post-production, with open-source code enabling broader adoption.

Abstract

Denoising is a core operation in modern video pipelines. In codecs, in-loop filters suppress sensor noise and quantisation artefacts to improve rate-distortion performance; in cinema post-production, denoisers are used for restoration, grain management, and plate clean-up. However, state-of-the-art deep denoisers are computationally intensive and, at scale, are typically deployed on GPUs, incurring high power and cost for real-time, high-resolution streams. This paper presents Real-Time Denoise (ReTiDe), a hardware-accelerated denoising system that serves inference on data-centre Field Programmable Gate Arrays (FPGAs). A compact convolutional model is quantised (post-training quantisation plus quantisation-aware fine-tuning) to INT8 and compiled for AMD Deep Learning Processor Unit (DPU)-based FPGAs. A client-server integration offloads computation from the host CPU/GPU to a networked FPGA service, while remaining callable from existing workflows, e.g., NUKE, without disrupting artist tooling. On representative benchmarks, ReTiDe delivers 37.71$\times$ Giga Operations Per Second (GOPS) throughput and 5.29$\times$ higher energy efficiency than prior FPGA denoising accelerators, with negligible degradation in Peak Signal-to-Noise Ratio (PSNR)/Structural Similarity Index (SSIM). These results indicate that specialised accelerators can provide practical, scalable denoising for both encoding pipelines and post-production, reducing energy per frame without sacrificing quality or workflow compatibility. Code is available at https://github.com/RCSL-TCD/ReTiDe.

ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

TL;DR

ReTiDe tackles the challenge of energy-efficient, real-time denoising for high-resolution video by offloading inference to cloud-based FPGA DPUs. It introduces a lightweight, quantised denoiser (ReTiDe-Net) and an end-to-end client–server framework that integrates with NUKE, enabling seamless workflow adoption. Through with to and deployment on AMD -based FPGAs, it achieves up to GOPS throughput and GOPS/W energy efficiency, while maintaining PSNR/SSIM on par with FP32 baselines for both colour and grayscale denoising. This work demonstrates that targeted hardware acceleration can provide practical, scalable denoising for encoding pipelines and post-production, with open-source code enabling broader adoption.

Abstract

Denoising is a core operation in modern video pipelines. In codecs, in-loop filters suppress sensor noise and quantisation artefacts to improve rate-distortion performance; in cinema post-production, denoisers are used for restoration, grain management, and plate clean-up. However, state-of-the-art deep denoisers are computationally intensive and, at scale, are typically deployed on GPUs, incurring high power and cost for real-time, high-resolution streams. This paper presents Real-Time Denoise (ReTiDe), a hardware-accelerated denoising system that serves inference on data-centre Field Programmable Gate Arrays (FPGAs). A compact convolutional model is quantised (post-training quantisation plus quantisation-aware fine-tuning) to INT8 and compiled for AMD Deep Learning Processor Unit (DPU)-based FPGAs. A client-server integration offloads computation from the host CPU/GPU to a networked FPGA service, while remaining callable from existing workflows, e.g., NUKE, without disrupting artist tooling. On representative benchmarks, ReTiDe delivers 37.71 Giga Operations Per Second (GOPS) throughput and 5.29 higher energy efficiency than prior FPGA denoising accelerators, with negligible degradation in Peak Signal-to-Noise Ratio (PSNR)/Structural Similarity Index (SSIM). These results indicate that specialised accelerators can provide practical, scalable denoising for both encoding pipelines and post-production, reducing energy per frame without sacrificing quality or workflow compatibility. Code is available at https://github.com/RCSL-TCD/ReTiDe.

Paper Structure

This paper contains 15 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: ReTiDe model structure.
  • Figure 2: Diagram of the Vitis-NUKE integration.
  • Figure 3: Pre-processing of large input images, parallel hardware-accelerated noise reduction and post-processing.
  • Figure 4: NUKE User Interface.
  • Figure 5: Comparison of output results with other quantised image denoising models under the noise level of 35.
  • ...and 1 more figures