Table of Contents
Fetching ...

Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels

Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu

TL;DR

This work investigates whether trained depthwise kernels in DS-CNNs encode identifiable, interpretable structure. It employs large-scale analysis across multiple architectures and a cosine-loss autoencoder to project kernels into a 1D latent space for unsupervised clustering, revealing recurring patterns that resemble Difference-of-Gaussians and their derivatives. The authors demonstrate that roughly ten DoG-like patterns account for most depthwise filters across models, with high labeling coverage in ConvNextV2 and ConvNeXt and clear links to biological vision models. The findings advance interpretability and suggest bio-inspired directions for initialization and architecture design to enhance robustness and generalization.

Abstract

Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an extensive analysis of millions of trained filters, with different sizes and from various models, we employed unsupervised clustering with autoencoders, to categorize these filters. Astonishingly, the patterns converged into a few main clusters, each resembling the difference of Gaussian (DoG) functions, and their first and second-order derivatives. Notably, we were able to classify over 95\% and 90\% of the filters from state-of-the-art ConvNextV2 and ConvNeXt models, respectively. This finding is not merely a technological curiosity; it echoes the foundational models neuroscientists have long proposed for the vision systems of mammals. Our results thus deepen our understanding of the emergent properties of trained DS-CNNs and provide a bridge between artificial and biological visual processing systems. More broadly, they pave the way for more interpretable and biologically-inspired neural network designs in the future.

Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels

TL;DR

This work investigates whether trained depthwise kernels in DS-CNNs encode identifiable, interpretable structure. It employs large-scale analysis across multiple architectures and a cosine-loss autoencoder to project kernels into a 1D latent space for unsupervised clustering, revealing recurring patterns that resemble Difference-of-Gaussians and their derivatives. The authors demonstrate that roughly ten DoG-like patterns account for most depthwise filters across models, with high labeling coverage in ConvNextV2 and ConvNeXt and clear links to biological vision models. The findings advance interpretability and suggest bio-inspired directions for initialization and architecture design to enhance robustness and generalization.

Abstract

Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures, that surpass the performance of classical CNNs, by a considerable scalability and accuracy margin. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers. Through an extensive analysis of millions of trained filters, with different sizes and from various models, we employed unsupervised clustering with autoencoders, to categorize these filters. Astonishingly, the patterns converged into a few main clusters, each resembling the difference of Gaussian (DoG) functions, and their first and second-order derivatives. Notably, we were able to classify over 95\% and 90\% of the filters from state-of-the-art ConvNextV2 and ConvNeXt models, respectively. This finding is not merely a technological curiosity; it echoes the foundational models neuroscientists have long proposed for the vision systems of mammals. Our results thus deepen our understanding of the emergent properties of trained DS-CNNs and provide a bridge between artificial and biological visual processing systems. More broadly, they pave the way for more interpretable and biologically-inspired neural network designs in the future.
Paper Structure (24 sections, 1 equation, 40 figures, 4 tables)

This paper contains 24 sections, 1 equation, 40 figures, 4 tables.

Figures (40)

  • Figure 1: 3D and 2D plots of the DoG function and its derivatives, utilized in Neuroscience for modeling visual receptive fields and in Image Processing for edge and brightness change detection.
  • Figure 2: Random samples of regular (i) and depthwise (ii) convolutional kernels across all layers. Depthwise convolutions show structured patterns while regular convolutions appear uninterpretable.
  • Figure 3: Reconstructed spectrum of 7x7 kernel filters from the 1D hidden code of the autoencoder model, trained on more than 1 million filters. Each segment of the spectrum is marked with the corresponding cluster label. (See Figure \ref{['fig:autoencoder']} for the architecture of the autoencoder.)
  • Figure 4: Scatter plot of PCA applied on 7×7 ConvNeXt filters. Sample filters from the 4 most clear clusters are visualized on the side of the plot.
  • Figure 5: Random samples from each of the prominent classes of $7{\times}7$ kernels of ConvNextV2-tiny trained on ImageNet. Our method classifies the samples with notable accuracy.
  • ...and 35 more figures