Table of Contents
Fetching ...

JND-Guided Light-Weight Neural Pre-Filter for Perceptual Image Coding

Chenlong He, Zhijian Hao, Leilei Huang, Xiaoyang Zeng, Yibo Fan

TL;DR

This work tackles perceptual image coding by addressing the inefficiencies of existing JND-guided pre-filters through a unified benchmark and a compact neural approach. It introduces FJNDF-Pytorch, a standardized platform that combines multiple JND models, injection strategies, encoders, datasets, and objective metrics, enabling fair and reproducible comparisons. The authors then present a lightweight CNN framework trained with a novel frequency-domain loss that both distills the reference behavior and enforces physics-based constraints in the DCT domain, achieving state-of-the-art BD-BR gains across several encoders while maintaining low computational cost (7.15 GFLOPs for 1080p). Overall, the paper delivers a reproducible research platform and a principled learning approach that advances the balance between perceptual quality and efficiency in perceptual image coding, with strong empirical validation and broad applicability across encoders and datasets.

Abstract

Just Noticeable Distortion (JND)-guided pre-filter is a promising technique for improving the perceptual compression efficiency of image coding. However, existing methods are often computationally expensive, and the field lacks standardized benchmarks for fair comparison. To address these challenges, this paper introduces a twofold contribution. First, we develop and open-source FJNDF-Pytorch, a unified benchmark for frequency-domain JND-Guided pre-filters. Second, leveraging this platform, we propose a complete learning framework for a novel, lightweight Convolutional Neural Network (CNN). Experimental results demonstrate that our proposed method achieves state-of-the-art compression efficiency, consistently outperforming competitors across multiple datasets and encoders. In terms of computational cost, our model is exceptionally lightweight, requiring only 7.15 GFLOPs to process a 1080p image, which is merely 14.1% of the cost of recent lightweight network. Our work presents a robust, state-of-the-art solution that excels in both performance and efficiency, supported by a reproducible research platform. The open-source implementation is available at https://github.com/viplab-fudan/FJNDF-Pytorch.

JND-Guided Light-Weight Neural Pre-Filter for Perceptual Image Coding

TL;DR

This work tackles perceptual image coding by addressing the inefficiencies of existing JND-guided pre-filters through a unified benchmark and a compact neural approach. It introduces FJNDF-Pytorch, a standardized platform that combines multiple JND models, injection strategies, encoders, datasets, and objective metrics, enabling fair and reproducible comparisons. The authors then present a lightweight CNN framework trained with a novel frequency-domain loss that both distills the reference behavior and enforces physics-based constraints in the DCT domain, achieving state-of-the-art BD-BR gains across several encoders while maintaining low computational cost (7.15 GFLOPs for 1080p). Overall, the paper delivers a reproducible research platform and a principled learning approach that advances the balance between perceptual quality and efficiency in perceptual image coding, with strong empirical validation and broad applicability across encoders and datasets.

Abstract

Just Noticeable Distortion (JND)-guided pre-filter is a promising technique for improving the perceptual compression efficiency of image coding. However, existing methods are often computationally expensive, and the field lacks standardized benchmarks for fair comparison. To address these challenges, this paper introduces a twofold contribution. First, we develop and open-source FJNDF-Pytorch, a unified benchmark for frequency-domain JND-Guided pre-filters. Second, leveraging this platform, we propose a complete learning framework for a novel, lightweight Convolutional Neural Network (CNN). Experimental results demonstrate that our proposed method achieves state-of-the-art compression efficiency, consistently outperforming competitors across multiple datasets and encoders. In terms of computational cost, our model is exceptionally lightweight, requiring only 7.15 GFLOPs to process a 1080p image, which is merely 14.1% of the cost of recent lightweight network. Our work presents a robust, state-of-the-art solution that excels in both performance and efficiency, supported by a reproducible research platform. The open-source implementation is available at https://github.com/viplab-fudan/FJNDF-Pytorch.

Paper Structure

This paper contains 13 sections, 6 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Illustration of our main contributions. (a) We build a standardized benchmark for frequency-domain JND pre-filters. (b) Using this benchmark, we generate a dataset to train our proposed lightweight CNN via a supervised pipeline, which is optimized by a joint spatial and frequency loss function.
  • Figure 2: An overview of the FJNDF-Pytorch framework. (a) The modular toolbox, whose components are detailed in Table \ref{['tab:components']}, supports two primary pipelines. (b) A benchmarking pipeline for evaluating filters and generating training data. (c) A training pipeline that leverages this data to train learning-based pre-filters.
  • Figure 3: Training pipeline of our lightweight pre-filter network. Our core contribution, the hybrid loss, supervises the network. The spatial-domain loss ($L_{c},L_{m}$) is computed against a Ground Truth (GT) from a reference filter. The frequency-domain loss imposes a residual constraint ($L^{\text{res}}_{\text{dct}}$) against the GT and a conservation constraint ($L^{\text{cons}}_{\text{dct}}$) against the original input. The network backbone (MBR, FST and HDPA) is detailed in mobileNetIE_2025_yan. MBR is re-parameterized into standard convolutions at inference pipeline.
  • Figure 4: Illustration of the frequency area partitioning for our proposed conservation constraint loss, $L^{\text{cons}}_{\text{dct}}$. An 8x8 DCT block is divided into a Low-Frequency ($LF$) and a High-Frequency ($HF$) area based on a zigzag scan order and a predefined threshold, $K$.