Table of Contents
Fetching ...

Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN

Minsoo Kang, Minkoo Kang, Suhyun Kim

TL;DR

Catch-up Mix addresses the problem that CNNs over-rely on a small subset of filters, which harms generalization and robustness. It introduces a feature-level mixup that targets slow-learning filters by mixing activation maps with relatively low $\ell_{2}$ norms, guided by a relative filter influence metric $RFI$, and uses a binary mask to blend source and target feature maps across a randomly chosen layer. The method yields superior generalization across CIFAR-100, Tiny-ImageNet, and fine-grained datasets, while enhancing robustness to FGSM attacks, data corruptions, deformations, and OOD detection, with flat loss landscapes as a byproduct. Overall, Catch-up Mix expands effective capacity usage, improves feature diversity, and provides practical robustness gains with minimal overhead, offering a principled way to utilize more of a network’s filters during training.

Abstract

Deep learning has made significant advances in computer vision, particularly in image classification tasks. Despite their high accuracy on training data, deep learning models often face challenges related to complexity and overfitting. One notable concern is that the model often relies heavily on a limited subset of filters for making predictions. This dependency can result in compromised generalization and an increased vulnerability to minor variations. While regularization techniques like weight decay, dropout, and data augmentation are commonly used to address this issue, they may not directly tackle the reliance on specific filters. Our observations reveal that the heavy reliance problem gets severe when slow-learning filters are deprived of learning opportunities due to fast-learning filters. Drawing inspiration from image augmentation research that combats over-reliance on specific image regions by removing and replacing parts of images, our idea is to mitigate the problem of over-reliance on strong filters by substituting highly activated features. To this end, we present a novel method called Catch-up Mix, which provides learning opportunities to a wide range of filters during training, focusing on filters that may lag behind. By mixing activation maps with relatively lower norms, Catch-up Mix promotes the development of more diverse representations and reduces reliance on a small subset of filters. Experimental results demonstrate the superiority of our method in various vision classification datasets, providing enhanced robustness.

Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN

TL;DR

Catch-up Mix addresses the problem that CNNs over-rely on a small subset of filters, which harms generalization and robustness. It introduces a feature-level mixup that targets slow-learning filters by mixing activation maps with relatively low norms, guided by a relative filter influence metric , and uses a binary mask to blend source and target feature maps across a randomly chosen layer. The method yields superior generalization across CIFAR-100, Tiny-ImageNet, and fine-grained datasets, while enhancing robustness to FGSM attacks, data corruptions, deformations, and OOD detection, with flat loss landscapes as a byproduct. Overall, Catch-up Mix expands effective capacity usage, improves feature diversity, and provides practical robustness gains with minimal overhead, offering a principled way to utilize more of a network’s filters during training.

Abstract

Deep learning has made significant advances in computer vision, particularly in image classification tasks. Despite their high accuracy on training data, deep learning models often face challenges related to complexity and overfitting. One notable concern is that the model often relies heavily on a limited subset of filters for making predictions. This dependency can result in compromised generalization and an increased vulnerability to minor variations. While regularization techniques like weight decay, dropout, and data augmentation are commonly used to address this issue, they may not directly tackle the reliance on specific filters. Our observations reveal that the heavy reliance problem gets severe when slow-learning filters are deprived of learning opportunities due to fast-learning filters. Drawing inspiration from image augmentation research that combats over-reliance on specific image regions by removing and replacing parts of images, our idea is to mitigate the problem of over-reliance on strong filters by substituting highly activated features. To this end, we present a novel method called Catch-up Mix, which provides learning opportunities to a wide range of filters during training, focusing on filters that may lag behind. By mixing activation maps with relatively lower norms, Catch-up Mix promotes the development of more diverse representations and reduces reliance on a small subset of filters. Experimental results demonstrate the superiority of our method in various vision classification datasets, providing enhanced robustness.
Paper Structure (33 sections, 8 equations, 8 figures, 13 tables, 1 algorithm)

This paper contains 33 sections, 8 equations, 8 figures, 13 tables, 1 algorithm.

Figures (8)

  • Figure 1: We visualize activation maps, $\ell_{2}$ norm, and gradients to understand how filters behave during training. (a) shows visualizations of images, saliency maps, and activation maps. Activation maps are obtained from the third layer block output of ResNet-18 while training CUB-200 at 120 epochs. (b) represents how distributions of activation norm (x-axis) and gradient norm (y-axis) change over epochs. For example, among rows of 'Baseline,' the gradient scale decreases significantly after 60 epochs, making it difficult to update the weight properly for the remaining process. (c) depicts the model's accuracy as latent vectors are sequentially dropped by their value. This highlights that our method prompts the model to use diverse features.
  • Figure 2: The overall framework and augmentation process of Catch-up Mix. (a) represents a general deep learning architecture, and (b) outlines the procedure of Catch-up Mix. First, we compare the magnitudes of activation maps using their $\ell_{2}$ norms. A mask $\mathbb{M}$ is generated to mix the activation maps from the selected pair with relatively low $\ell_{2}$ norms.
  • Figure 3: Loss Landscapes of mixup methods on Tiny-ImageNet using PreActResNet-18.
  • Figure 4: Top-1 accuracy rates (%), validation loss, and training loss curve of mixup baselines on CIFAR using PreActResNet-18 and CUB using ResNet-18.
  • Figure 5: Correlation Between feature reduction and model accuracy. After removing each dimension of the latent vector with the highest and the lowest values, we measure the top-1 accuracy by making predictions with the remaining features.
  • ...and 3 more figures