Table of Contents
Fetching ...

The Master Key Filters Hypothesis: Deep Filters Are General

Zahra Babaiee, Peyman M. Kiasari, Daniela Rus, Radu Grosu

TL;DR

The paper questions the assumption that deeper CNN layers learn increasingly task-specific filters, focusing on depthwise separable CNNs (DS-CNNs). It introduces the Master Key Filters Hypothesis, asserting that depthwise filters converge to general spatial primitives that transfer across datasets, domains, and architectures. Through extensive cross-domain, cross-architecture, and semantically split ImageNet experiments, the study shows deep depthwise filters retain generality, while pointwise filters exhibit optimization-related challenges. These findings have practical implications for transfer learning and model design, suggesting larger, more diverse training data can yield universally reusable spatial features.

Abstract

This paper challenges the prevailing view that convolutional neural network (CNN) filters become increasingly specialized in deeper layers. Motivated by recent observations of clusterable repeating patterns in depthwise separable CNNs (DS-CNNs) trained on ImageNet, we extend this investigation across various domains and datasets. Our analysis of DS-CNNs reveals that deep filters maintain generality, contradicting the expected transition to class-specific filters. We demonstrate the generalizability of these filters through transfer learning experiments, showing that frozen filters from models trained on different datasets perform well and can be further improved when sourced from larger datasets. Our findings indicate that spatial features learned by depthwise separable convolutions remain generic across all layers, domains, and architectures. This research provides new insights into the nature of generalization in neural networks, particularly in DS-CNNs, and has significant implications for transfer learning and model design.

The Master Key Filters Hypothesis: Deep Filters Are General

TL;DR

The paper questions the assumption that deeper CNN layers learn increasingly task-specific filters, focusing on depthwise separable CNNs (DS-CNNs). It introduces the Master Key Filters Hypothesis, asserting that depthwise filters converge to general spatial primitives that transfer across datasets, domains, and architectures. Through extensive cross-domain, cross-architecture, and semantically split ImageNet experiments, the study shows deep depthwise filters retain generality, while pointwise filters exhibit optimization-related challenges. These findings have practical implications for transfer learning and model design, suggesting larger, more diverse training data can yield universally reusable spatial features.

Abstract

This paper challenges the prevailing view that convolutional neural network (CNN) filters become increasingly specialized in deeper layers. Motivated by recent observations of clusterable repeating patterns in depthwise separable CNNs (DS-CNNs) trained on ImageNet, we extend this investigation across various domains and datasets. Our analysis of DS-CNNs reveals that deep filters maintain generality, contradicting the expected transition to class-specific filters. We demonstrate the generalizability of these filters through transfer learning experiments, showing that frozen filters from models trained on different datasets perform well and can be further improved when sourced from larger datasets. Our findings indicate that spatial features learned by depthwise separable convolutions remain generic across all layers, domains, and architectures. This research provides new insights into the nature of generalization in neural networks, particularly in DS-CNNs, and has significant implications for transfer learning and model design.

Paper Structure

This paper contains 11 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Random depthwise filters sampled from the first, middle, and last layers of ConvNeXt Base and HorNet Small trained on ImageNet. Spatial features in DS-CNNs follow similar patterns regardless of the model architecture and layer.
  • Figure 2: Overview of the experimental setup for depthwise filter transfers. Top: The base model-A is trained on the source dataset-A. Bottom: In the transfer model-B, the first n depthwise convolution layers of the network (in this example, n = 3) are transferred and frozen from the base model-A, the rest of the layers are randomly initialized, and then, they are trained on the target dataset-B. This experiment tests the extent to which the filters on layer n are general or specific.
  • Figure 3: This Figure replicates and extends the study by NIPS2014_375c7134 using Resnets and DS-CNNs. ImageNet was split into man-made (m) and natural (n) classes. Networks A and B are trained on man-made and natural classes, respectively. The first $n$ layers are transferred from A to B, and this is denoted by AnB. The plots show relative accuracy to base models versus transfer depth. Each point indicates Network B's performance after transferring and freezing filters from A up to layer n, with the remaining layers trained on the natural subset. Notably, depthwise filters exhibit high transferability across all layers, maintaining consistent performance regardless of transfer depth. This suggests a high degree of generality in depthwise convolutional filters, contrasting with 2014 experiment where performance degrades when transferring deeper layers between dissimilar domains.
  • Figure 4: ResNet18: Figure \ref{['fig:manmade-natural']} experiment, but reversed. Freezing layers from the end to n while training earlier layers. Following same reasoning as Yosinski's, these results would suggest later layers are general while initial layers are special.
  • Figure 5: Accuracy retention comparison across ResNet architectures. Deeper networks (ResNet-50/101/152) maintain substantially higher accuracy after transfer compared to shallower networks (ResNet-18/34) and Yosinski's model.
  • ...and 1 more figures