Table of Contents
Fetching ...

Explaining Model Overfitting in CNNs via GMM Clustering

Hui Dou, Xinyu Mu, Mengjun Yi, Feng Han, Jian Zhao, Furao Shen

TL;DR

This work tackles CNN interpretability and overfitting by introducing a Gaussian Mixture Model (GMM) based clustering of per-filter feature maps to identify anomaly filters linked to overfitting. It clusters PCA-reduced activations $D^l \in \mathbb{R}^{Batch\times C\times 2}$ for each filter, uses the Calinski-Harabasz Index $CH = \frac{SSB /(K - 1)}{SSW /(N - K)}$ to assess clustering quality, and dynamically selects the number of classes $K$ per filter to discover learned patterns. Three experiments across AlexNet, LeNet-5, and a simple CNN on CIFAR-10/100 and Fashion-MNIST validate three hypotheses: anomaly filters increase with overfitting, outlier samples drive overfitting via larger gradients, and pruning anomaly filters enhances generalization. The results offer a practical, architecture-agnostic diagnostic tool for CNN overfitting and suggest a pruning-based route to improve generalization, with potential extensions to larger architectures and semantically-informed clustering metrics. Key equations include the GMM data likelihood $p(\boldsymbol{x})=\sum_{k=1}^K \pi_k \mathcal{N}(\boldsymbol{x}|\mu_k, \Sigma_k)$ and CH as above.

Abstract

Convolutional Neural Networks (CNNs) have demonstrated remarkable prowess in the field of computer vision. However, their opaque decision-making processes pose significant challenges for practical applications. In this study, we provide quantitative metrics for assessing CNN filters by clustering the feature maps corresponding to individual filters in the model via Gaussian Mixture Model (GMM). By analyzing the clustering results, we screen out some anomaly filters associated with outlier samples. We further analyze the relationship between the anomaly filters and model overfitting, proposing three hypotheses. This method is universally applicable across diverse CNN architectures without modifications, as evidenced by its successful application to models like AlexNet and LeNet-5. We present three meticulously designed experiments demonstrating our hypotheses from the perspectives of model behavior, dataset characteristics, and filter impacts. Through this work, we offer a novel perspective for evaluating the CNN performance and gain new insights into the operational behavior of model overfitting.

Explaining Model Overfitting in CNNs via GMM Clustering

TL;DR

This work tackles CNN interpretability and overfitting by introducing a Gaussian Mixture Model (GMM) based clustering of per-filter feature maps to identify anomaly filters linked to overfitting. It clusters PCA-reduced activations for each filter, uses the Calinski-Harabasz Index to assess clustering quality, and dynamically selects the number of classes per filter to discover learned patterns. Three experiments across AlexNet, LeNet-5, and a simple CNN on CIFAR-10/100 and Fashion-MNIST validate three hypotheses: anomaly filters increase with overfitting, outlier samples drive overfitting via larger gradients, and pruning anomaly filters enhances generalization. The results offer a practical, architecture-agnostic diagnostic tool for CNN overfitting and suggest a pruning-based route to improve generalization, with potential extensions to larger architectures and semantically-informed clustering metrics. Key equations include the GMM data likelihood and CH as above.

Abstract

Convolutional Neural Networks (CNNs) have demonstrated remarkable prowess in the field of computer vision. However, their opaque decision-making processes pose significant challenges for practical applications. In this study, we provide quantitative metrics for assessing CNN filters by clustering the feature maps corresponding to individual filters in the model via Gaussian Mixture Model (GMM). By analyzing the clustering results, we screen out some anomaly filters associated with outlier samples. We further analyze the relationship between the anomaly filters and model overfitting, proposing three hypotheses. This method is universally applicable across diverse CNN architectures without modifications, as evidenced by its successful application to models like AlexNet and LeNet-5. We present three meticulously designed experiments demonstrating our hypotheses from the perspectives of model behavior, dataset characteristics, and filter impacts. Through this work, we offer a novel perspective for evaluating the CNN performance and gain new insights into the operational behavior of model overfitting.

Paper Structure

This paper contains 13 sections, 6 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Given a pre-trained model, we cluster all the feature maps corresponding to the individual filter through the Gaussian Mixture Model.
  • Figure 2: Visualization the clustering results. Each data point corresponds to one feature map generated by one input sample. There are normal cases where data points are evenly distributed and rare cases where outlier points occur. The filter corresponding to the rare type of clustering result is the anomaly filter.
  • Figure 3: We cluster the feature maps corresponding to the individual filter respectively, i.e., each data point in the clustering results corresponds to one feature map. Different colors represent different filters. Meanwhile, We categorize the clustering results into normal/rare cases. In this work, we mainly focus on the rare case scenario as marked in the red box.
  • Figure 4: The training curve and the number of anomaly filters for a simple CNN in the dataset of CIFAR-10. Obvious overfitting occurs for there is a drop in the accuracy curve and an increase in the loss curve. The number of anomaly filters curve shows a similar trend to the loss curve, as when models become overfitting, the number of anomaly filters rises with fluctuations.