ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

Changhong Li; Clément Bled; Rosa Fernandez; Shreejith Shanker

ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

Changhong Li, Clément Bled, Rosa Fernandez, Shreejith Shanker

TL;DR

ReTiDe tackles the challenge of energy-efficient, real-time denoising for high-resolution video by offloading inference to cloud-based FPGA DPUs. It introduces a lightweight, quantised denoiser (ReTiDe-Net) and an end-to-end client–server framework that integrates with NUKE, enabling seamless workflow adoption. Through $PTQ$ with $QAT$ to $INT8$ and deployment on AMD $DPU$-based FPGAs, it achieves up to $3{,}746.09$ GOPS throughput and $203.59$ GOPS/W energy efficiency, while maintaining PSNR/SSIM on par with FP32 baselines for both colour and grayscale denoising. This work demonstrates that targeted hardware acceleration can provide practical, scalable denoising for encoding pipelines and post-production, with open-source code enabling broader adoption.

Abstract

Denoising is a core operation in modern video pipelines. In codecs, in-loop filters suppress sensor noise and quantisation artefacts to improve rate-distortion performance; in cinema post-production, denoisers are used for restoration, grain management, and plate clean-up. However, state-of-the-art deep denoisers are computationally intensive and, at scale, are typically deployed on GPUs, incurring high power and cost for real-time, high-resolution streams. This paper presents Real-Time Denoise (ReTiDe), a hardware-accelerated denoising system that serves inference on data-centre Field Programmable Gate Arrays (FPGAs). A compact convolutional model is quantised (post-training quantisation plus quantisation-aware fine-tuning) to INT8 and compiled for AMD Deep Learning Processor Unit (DPU)-based FPGAs. A client-server integration offloads computation from the host CPU/GPU to a networked FPGA service, while remaining callable from existing workflows, e.g., NUKE, without disrupting artist tooling. On representative benchmarks, ReTiDe delivers 37.71$\times$ Giga Operations Per Second (GOPS) throughput and 5.29$\times$ higher energy efficiency than prior FPGA denoising accelerators, with negligible degradation in Peak Signal-to-Noise Ratio (PSNR)/Structural Similarity Index (SSIM). These results indicate that specialised accelerators can provide practical, scalable denoising for both encoding pipelines and post-production, reducing energy per frame without sacrificing quality or workflow compatibility. Code is available at https://github.com/RCSL-TCD/ReTiDe.

ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

TL;DR

Abstract

ReTiDe: Real-Time Denoising for Energy-Efficient Motion Picture Processing with FPGAs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)