Table of Contents
Fetching ...

Modelling and analysis of the 8 filters from the "master key filters hypothesis" for depthwise-separable deep networks in relation to idealized receptive fields based on scale-space theory

Tony Lindeberg, Zahra Babaiee, Peyman M. Kiasari

TL;DR

Experimental results demonstrate the idealized models of receptive fields have good predictive properties for replacing the learned filters by idealized filters in depthwise-separable deep networks, thus showing that the learned filters in depthwise-separable deep networks can be well approximated by discrete scale-space filters.

Abstract

This paper presents the results of analysing and modelling a set of 8 ``master key filters'', which have been extracted by applying a clustering approach to the receptive fields learned in depthwise-separable deep networks based on the ConvNeXt architecture. For this purpose, we first compute spatial spread measures in terms of weighted mean values and weighted variances of the absolute values of the learned filters, which support the working hypotheses that: (i) the learned filters can be modelled by separable filtering operations over the spatial domain, and that (ii) the spatial offsets of the those learned filters that are non-centered are rather close to half a grid unit. Then, we model the clustered ``master key filters'' in terms of difference operators applied to a spatial smoothing operation in terms of the discrete analogue of the Gaussian kernel, and demonstrate that the resulting idealized models of the receptive fields show good qualitative similarity to the learned filters. This modelling is performed in two different ways: (i) using possibly different values of the scale parameters in the coordinate directions for each filter, and (ii) using the same value of the scale parameter in both coordinate directions. Then, we perform the actual model fitting by either (i) requiring spatial spread measures in terms of spatial variances of the absolute values of the receptive fields to be equal, or (ii) minimizing the discrete $l_1$- or $l_2$-norms between the idealized receptive field models and the learned filters. Complementary experimental results then demonstrate the idealized models of receptive fields have good predictive properties for replacing the learned filters by idealized filters in depthwise-separable deep networks, thus showing that the learned filters in depthwise-separable deep networks can be well approximated by discrete scale-space filters.

Modelling and analysis of the 8 filters from the "master key filters hypothesis" for depthwise-separable deep networks in relation to idealized receptive fields based on scale-space theory

TL;DR

Experimental results demonstrate the idealized models of receptive fields have good predictive properties for replacing the learned filters by idealized filters in depthwise-separable deep networks, thus showing that the learned filters in depthwise-separable deep networks can be well approximated by discrete scale-space filters.

Abstract

This paper presents the results of analysing and modelling a set of 8 ``master key filters'', which have been extracted by applying a clustering approach to the receptive fields learned in depthwise-separable deep networks based on the ConvNeXt architecture. For this purpose, we first compute spatial spread measures in terms of weighted mean values and weighted variances of the absolute values of the learned filters, which support the working hypotheses that: (i) the learned filters can be modelled by separable filtering operations over the spatial domain, and that (ii) the spatial offsets of the those learned filters that are non-centered are rather close to half a grid unit. Then, we model the clustered ``master key filters'' in terms of difference operators applied to a spatial smoothing operation in terms of the discrete analogue of the Gaussian kernel, and demonstrate that the resulting idealized models of the receptive fields show good qualitative similarity to the learned filters. This modelling is performed in two different ways: (i) using possibly different values of the scale parameters in the coordinate directions for each filter, and (ii) using the same value of the scale parameter in both coordinate directions. Then, we perform the actual model fitting by either (i) requiring spatial spread measures in terms of spatial variances of the absolute values of the receptive fields to be equal, or (ii) minimizing the discrete - or -norms between the idealized receptive field models and the learned filters. Complementary experimental results then demonstrate the idealized models of receptive fields have good predictive properties for replacing the learned filters by idealized filters in depthwise-separable deep networks, thus showing that the learned filters in depthwise-separable deep networks can be well approximated by discrete scale-space filters.

Paper Structure

This paper contains 41 sections, 58 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: The original set of 8 "master key filters", obtained by applying a clustering technique to the receptive fields learned from depthwise-separable deep networks, as extracted by Babaiee et al. (BabKiaRusGro25-AAAI-master), "through greedy search on the ConvNeXt V2 Tiny model" developed by Woo et al. (WooDebHuCheLiuKweXi23-CVPR), in turn based on the regular ConvNeXt model developed by Liu et al. (LiuMaoWuFeiDarXie22-CVPR). (Horizontal axes: horizontal filter indices $m \in [-3, 3]$. Vertical axes: vertical filter indices $n \in [-3, 3]$.)
  • Figure 2: Visualizations of the learned filters with the corresponding results of fitting idealized models of these filters using scale-space operations: (top row) Alternative visualization of the original set of 8 "master key filters", extracted by Babaiee et al. (BabKiaRusGro25-AAAI-master), while here complemented with a normalization the filters, (i) by rescaling Filters 1--6 to give the same response to matching first-order discrete monomials under discrete convolution as for convolution of the corresponding continuous monomials with corresponding continuous Gaussian derivative operators, according to (\ref{['eq-def-h1-norm']})--(\ref{['eq-def-h6-norm']}), and (ii) by adding different constants to Filters 7--8, according to (\ref{['eq-def-h7-dc']})--(\ref{['eq-def-h8-norm']}), to minimize the variance-based spatial spread measure of the filters, and finally visualizing the data on a blue-red colour scale with the value 0 corresponding to white, to better reveal the polarities of the filter values. (rows 2-7) Idealized scale-space models of the filters, as computed with the different types of modelling approaches proposed in this paper: Method A in Section \ref{['sec-method-A']}, based on direct transfer of scale values from the variances for a continuous Gaussian derivative model, Method B in Section \ref{['sec-method-B']}, based on requiring the horizontal vs. the vertical discrete weighted variance-based spatial spread measures for the idealized receptive field models to be equal to the weighted discrete spatial spread measures for the corresponding learned filters, Method C1 in Section \ref{['sec-method-C1']}, based on minimizing the discrete $l_1$-norm between the idealized receptive field models and the normalized versions of the learned filters and using different values of the scale parameters in the horizontal and the vertical directions, Method C2 in Section \ref{['sec-method-C2']}, based on minimizing the discrete $l_1$-norm between the idealized receptive field models and the normalized versions of the learned filters and using the same values of the scale parameters in the horizontal and the vertical directions, Method D1 in Section \ref{['sec-method-D1']}, based on minimizing the discrete $l_2$-norm between the idealized receptive field models and the normalized versions of the learned filters and using different values of the scale parameters in the horizontal and the vertical directions, and Method D2 in Section \ref{['sec-method-D2']}, based on minimizing the discrete $l_2$-norm between the idealized receptive field models and the normalized versions of the learned filters and using the same values of the scale parameters in the horizontal and the vertical directions. (Note that the contrasts of Filters 2 and 3 are reversed in relation to the sign conventions for the corresponding "master key filters".) (Horizontal axes: horizontal filter indices $m \in [-3, 3]$. Vertical axes: vertical filter indices $n \in [-3, 3]$.)
  • Figure 3: Architectural overview of the ConvNeXt V2 Tiny network. Here, the abbreviation "LN" denotes "Layer Normalization", while "GRN" denotes "Global Response Normalization".
  • Figure 4: Computational molecules visualizing the computational functions of the different (top row) non-centered first-order difference operators and (bottom row) centered first-order and second-order difference operators considered in this paper.
  • Figure 5: Graphs of the error measure $\sqrt{\det V(|h_i(\cdot, \cdot) - C_i|)}$ for $i \in \{ 7, 8 \}$ when determining the DC-compensation constants $C_7 \approx -0.0118$ and $C_8 \approx -0.0386$ according to (\ref{['eq-def-crit-determ-Chat']}), in order to later renormalize the Gaussian-like "master key filters" to unit $l_1$-norm, as regular Gaussian kernels obey. (Horizontal axes: parameter value $C_i \in [-0.1, 0.1]$. Vertical axes: error measure: $\sqrt{\det V(|h_i(\cdot, \cdot) - C_i|)}$.)
  • ...and 1 more figures