Table of Contents
Fetching ...

In-Loop Filtering via Trained Look-Up Tables

Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

TL;DR

This work tackles the practicality gap of neural network–based in-loop filtering by replacing heavy DNN inference with a LUT-based surrogate (LUT-ILF). It trains a compact network within a restricted receptive field, caches outputs into a clipped $4D$ LUT, and retrieves filtered pixels via interpolation, with finetuning to mitigate clipping effects. Enhanced by complementary reference indexing, progressive cascaded LUTs, and learnable pattern weighting, LUT-ILF supports ultrafast to fast modes with controllable storage and modest increases in time. Integrated into VVC (VTM-11.0) with RD-based decision, LUT-ILF achieves BD-rate reductions on AI and RA configurations while maintaining practical complexity and memory footprints, offering a scalable path for practical neural-network–inspired coding tools.

Abstract

In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time and computational complexity, and high demands of high-performance hardware, which is challenging to apply to the general uses of coding scene. To address this limitation, inspired by explorations in image restoration, we propose an efficient and practical in-loop filtering scheme by adopting the Look-up Table (LUT). We train the DNN of in-loop filtering within a fixed filtering reference range, and cache the output values of the DNN into a LUT via traversing all possible inputs. At testing time in the coding process, the filtered pixel is generated by locating input pixels (to-be-filtered pixel with reference pixels) and interpolating cached filtered pixel values. To further enable the large filtering reference range with the limited storage cost of LUT, we introduce the enhanced indexing mechanism in the filtering process, and clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile Video Coding (VVC) reference software, VTM-11.0. Experimental results show that the ultrafast, very fast, and fast mode of the proposed method achieves on average 0.13%/0.34%/0.51%, and 0.10%/0.27%/0.39% BD-rate reduction, under the all intra (AI) and random access (RA) configurations. Especially, our method has friendly time and computational complexity, only 101%/102%-104%/108% time increase with 0.13-0.93 kMACs/pixel, and only 164-1148 KB storage cost for a single model. Our solution may shed light on the journey of practical neural network-based coding tool evolution.

In-Loop Filtering via Trained Look-Up Tables

TL;DR

This work tackles the practicality gap of neural network–based in-loop filtering by replacing heavy DNN inference with a LUT-based surrogate (LUT-ILF). It trains a compact network within a restricted receptive field, caches outputs into a clipped LUT, and retrieves filtered pixels via interpolation, with finetuning to mitigate clipping effects. Enhanced by complementary reference indexing, progressive cascaded LUTs, and learnable pattern weighting, LUT-ILF supports ultrafast to fast modes with controllable storage and modest increases in time. Integrated into VVC (VTM-11.0) with RD-based decision, LUT-ILF achieves BD-rate reductions on AI and RA configurations while maintaining practical complexity and memory footprints, offering a scalable path for practical neural-network–inspired coding tools.

Abstract

In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time and computational complexity, and high demands of high-performance hardware, which is challenging to apply to the general uses of coding scene. To address this limitation, inspired by explorations in image restoration, we propose an efficient and practical in-loop filtering scheme by adopting the Look-up Table (LUT). We train the DNN of in-loop filtering within a fixed filtering reference range, and cache the output values of the DNN into a LUT via traversing all possible inputs. At testing time in the coding process, the filtered pixel is generated by locating input pixels (to-be-filtered pixel with reference pixels) and interpolating cached filtered pixel values. To further enable the large filtering reference range with the limited storage cost of LUT, we introduce the enhanced indexing mechanism in the filtering process, and clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile Video Coding (VVC) reference software, VTM-11.0. Experimental results show that the ultrafast, very fast, and fast mode of the proposed method achieves on average 0.13%/0.34%/0.51%, and 0.10%/0.27%/0.39% BD-rate reduction, under the all intra (AI) and random access (RA) configurations. Especially, our method has friendly time and computational complexity, only 101%/102%-104%/108% time increase with 0.13-0.93 kMACs/pixel, and only 164-1148 KB storage cost for a single model. Our solution may shed light on the journey of practical neural network-based coding tool evolution.
Paper Structure (6 sections, 2 equations, 4 figures, 3 tables)

This paper contains 6 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Illustration of the basic framework of look-up table-based in-loop filtering framework (LUT-ILF).
  • Figure 2: Illustration of patterns of complementary reference indexing in LUT-ILF-U (only Pattern 1), LUT-ILF-V (Pattern 1$\sim$3), and LUT-ILF-F (Pattern 1$\sim$7). With the use of proposed indexing patterns, LUT-ILF can involve and address more reference pixels. For example, with Pattern 1$\sim$3, the 5$\times$5 reference range around $I_0$ is fully covered in LUT-ILF-V. With Pattern 1$\sim$7, the 7$\times$7 reference range around $I_0$ is fully covered in LUT-ILF-F. The covered reference pixels with the rotation ensemble trick are marked with dashed boxes.
  • Figure 3: Illustration of the LUT-ILF-V framework, it contains two parts. On the left, the input (to-be-filtered) pixel with the filtering reference range is shown; On the right, the process of LUT-ILF-V is shown, the parallel and cascaded networks/LUTs are performed with reference and progressive indexing at the training/testing. The covered reference range of each pattern with the rotation trick is marked with dashed boxes. For training, the convolution of each pattern can be implemented with standard convolutions and unfold/reshape operations. The Conv2$\times$2-D2 denotes the convolutional layer with a dilation size of 2.
  • Figure 4: The selection results of LUT-ILF-F of $Cactus\,1920\times1080$ on VTM-11.0 (AI configuration, QP:32, POC:29), the green block indicates the block filtered by LUT-ILF.