In-Loop Filtering via Trained Look-Up Tables
Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu
TL;DR
This work tackles the practicality gap of neural network–based in-loop filtering by replacing heavy DNN inference with a LUT-based surrogate (LUT-ILF). It trains a compact network within a restricted receptive field, caches outputs into a clipped $4D$ LUT, and retrieves filtered pixels via interpolation, with finetuning to mitigate clipping effects. Enhanced by complementary reference indexing, progressive cascaded LUTs, and learnable pattern weighting, LUT-ILF supports ultrafast to fast modes with controllable storage and modest increases in time. Integrated into VVC (VTM-11.0) with RD-based decision, LUT-ILF achieves BD-rate reductions on AI and RA configurations while maintaining practical complexity and memory footprints, offering a scalable path for practical neural-network–inspired coding tools.
Abstract
In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time and computational complexity, and high demands of high-performance hardware, which is challenging to apply to the general uses of coding scene. To address this limitation, inspired by explorations in image restoration, we propose an efficient and practical in-loop filtering scheme by adopting the Look-up Table (LUT). We train the DNN of in-loop filtering within a fixed filtering reference range, and cache the output values of the DNN into a LUT via traversing all possible inputs. At testing time in the coding process, the filtered pixel is generated by locating input pixels (to-be-filtered pixel with reference pixels) and interpolating cached filtered pixel values. To further enable the large filtering reference range with the limited storage cost of LUT, we introduce the enhanced indexing mechanism in the filtering process, and clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile Video Coding (VVC) reference software, VTM-11.0. Experimental results show that the ultrafast, very fast, and fast mode of the proposed method achieves on average 0.13%/0.34%/0.51%, and 0.10%/0.27%/0.39% BD-rate reduction, under the all intra (AI) and random access (RA) configurations. Especially, our method has friendly time and computational complexity, only 101%/102%-104%/108% time increase with 0.13-0.93 kMACs/pixel, and only 164-1148 KB storage cost for a single model. Our solution may shed light on the journey of practical neural network-based coding tool evolution.
