Table of Contents
Fetching ...

Parametric PerceptNet: A bio-inspired deep-net trained for Image Quality Assessment

Jorge Vila-Tomás, Pablo Hernández-Cámara, Valero Laparra, Jesús Malo

TL;DR

This work addresses image quality assessment by merging vision-science knowledge with deep learning through Parametric PerceptNet, a bio-inspired, fully parametric architecture that drastically reduces parameters while preserving interpretability. By enforcing biologically plausible layer functions and careful scaling, the model achieves competitive regression performance with a 3-order-of-magnitude reduction in parameters compared to nonparametric baselines. Ablation and visualization show that parameterizing Gabor and normalization stages preserves biophysical meaning and improves stability, whereas unconstrained training can induce feature spreading and reduce interpretability. The findings highlight the value of physics-informed priors in deep IQA models and motivate developing evaluation metrics beyond pure correlation to ensure human-like behavior and generalization in vision models.

Abstract

Human vision models are at the core of image processing. For instance, classical approaches to the problem of image quality are based on models that include knowledge about human vision. However, nowadays, deep learning approaches have obtained competitive results by simply approaching this problem as regression of human decisions, and training an standard network on human-rated datasets. These approaches have the advantages of being easily adaptable to a particular problem and they fit very efficiently when data is available. However, mainly due to the excess of parameters, they have the problems of lack of interpretability, and over-fitting. Here we propose a vision model that combines the best of both worlds by using a parametric neural network architecture. We parameterize the layers to have bioplausible functionality, and provide a set of bioplausible parameters. We analyzed different versions of the model and compared it with the non-parametric version. The parametric models achieve a three orders of magnitude reduction in the number of parameters without suffering in regression performance. Furthermore, we show that the parametric models behave better during training and are easier to interpret as vision models. Interestingly, we find that, even initialized with bioplausible trained for regression using human rated datasets, which we call the feature-spreading problem. This suggests that the deep learning approach is inherently flawed, and emphasizes the need to evaluate and train models beyond regression.

Parametric PerceptNet: A bio-inspired deep-net trained for Image Quality Assessment

TL;DR

This work addresses image quality assessment by merging vision-science knowledge with deep learning through Parametric PerceptNet, a bio-inspired, fully parametric architecture that drastically reduces parameters while preserving interpretability. By enforcing biologically plausible layer functions and careful scaling, the model achieves competitive regression performance with a 3-order-of-magnitude reduction in parameters compared to nonparametric baselines. Ablation and visualization show that parameterizing Gabor and normalization stages preserves biophysical meaning and improves stability, whereas unconstrained training can induce feature spreading and reduce interpretability. The findings highlight the value of physics-informed priors in deep IQA models and motivate developing evaluation metrics beyond pure correlation to ensure human-like behavior and generalization in vision models.

Abstract

Human vision models are at the core of image processing. For instance, classical approaches to the problem of image quality are based on models that include knowledge about human vision. However, nowadays, deep learning approaches have obtained competitive results by simply approaching this problem as regression of human decisions, and training an standard network on human-rated datasets. These approaches have the advantages of being easily adaptable to a particular problem and they fit very efficiently when data is available. However, mainly due to the excess of parameters, they have the problems of lack of interpretability, and over-fitting. Here we propose a vision model that combines the best of both worlds by using a parametric neural network architecture. We parameterize the layers to have bioplausible functionality, and provide a set of bioplausible parameters. We analyzed different versions of the model and compared it with the non-parametric version. The parametric models achieve a three orders of magnitude reduction in the number of parameters without suffering in regression performance. Furthermore, we show that the parametric models behave better during training and are easier to interpret as vision models. Interestingly, we find that, even initialized with bioplausible trained for regression using human rated datasets, which we call the feature-spreading problem. This suggests that the deep learning approach is inherently flawed, and emphasizes the need to evaluate and train models beyond regression.

Paper Structure

This paper contains 18 sections, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Scheme of the proposed Parametric PerceptNet. The architecture is the same as in Hepburn20 but substituting each layer by its functional version. The figure illustrates the widely known stages of the retina-LGN-V1 pathway: inspired by visual neuroscience, it is a specific cascade of linear+DN layers as in Martinez18Malo24. First, consistently with the Weber law, possibly happening in the LMS photoreceptors Stiles00Brainard10Fairchild13 the first (nonlinear) stage includes a DN Hillis05Hillis07 in the input RGB channels. The second (linear) stage is meant to perform a transform to color opponent channels Jameson57Vila23, the third (nonlinear) stage is meant to implement the adaptive nonlinearities of the RG-YB channels Gegen92Hita93Laparra12Laparra15, the fourth (linear) stage applies center-surround kernels to the (achromatic, red-green, yellow-blue) outputs of the previous stage following what is known of the LGN DeAngelis97Shapley11 and/or CSFs Campbell68Mullen85Li92Li22. Layer five (nonlinear) is a DN that is meant to represent generic energy masking of whatever frequency (i.e. in the spatial domain) Watson02Malo15Martinez18. Layer six (linear) applies Gabor wavelets mimicking the shape of V1 receptive fields Hubel59Hubel61Blakemore69Watson90Simoncelli92Olshausen96. Finally, the last layer implements a final DN to accommodate the known spatial-frequency-orientation masking that happens among Gabor channels Ross91Foley94Heeger92Watson97Carandini12. At the V1 stages the linear gain of the Gabor filters (or scale of the final DN) is critical to keep the bandwidth of the system as described by the CSF Malo97a. All the activation-like nonlinearities are implemented using DNs as is natural in vision science Carandini12. These non-standard nonlinearities (unlike ReLUs or sigmoids) have parameters and are multidimensional DN_in_enciclopedia19. Therefore their responses are not a single curve but multiple curves.
  • Figure 2: Positive bias induced by the parametrization. Histogram of Pearson correlations between 500 random initializations of the parametric (and the nonparamtric) PerceptNet and human opinion (TID2008 ponomarenko_tid2008_2009). Larger negative correlation means more aligned with human pinion.
  • Figure 3: Illustration of an optimization step in a parametric layer. The optimizable parameters are the generative parameters instead of the convolutional filters as a whole, which requires that we generate the convolutional filters at each step of the optimization so that the gradient can reach the generative parameters.
  • Figure 4: Learning curves for the three selected models. Training is performed in TID2008 while TID2013 is used for validation. Results in test can be seen in Table \ref{['tab:res_final_models']}. Note that both parametric models are initialized in the same way. The first point of each curve is calculated after the first epoch of optimization.
  • Figure 5: Illustrative image in the first layers (Retina-LGN), in processing order (top to bottom), for the three models (left to right).
  • ...and 6 more figures