Multi-stage feature decorrelation constraints for improving CNN classification performance

Qiuyu Zhu; Hao Wang; Xuewen Zu; Chengfei Liu

Multi-stage feature decorrelation constraints for improving CNN classification performance

Qiuyu Zhu, Hao Wang, Xuewen Zu, Chengfei Liu

TL;DR

This work tackles the problem that deep CNN losses primarily applied to the final layer underutilize front-layer representations, which can harbor redundant information. It introduces Multi-stage Feature Decorrelation Loss (MFD Loss), which uses Pearson correlation to decorrelate features across multiple early stages, and combines it with the Softmax loss to guide training toward more discriminative front-layer features. Empirical results across ResNet50, ResNeXt50, DenseNet121, and MobileNetV2 on CIFAR10/100, Tiny ImageNet, and FaceScrub show consistent accuracy gains, accompanied by reduced inter-feature correlations and illustrative feature-map changes. The method also generalizes to other loss functions for face recognition (Center Loss, AM-Softmax, ArcFace), and is proposed as a basis for future work in Transformer architectures and model-reduction scenarios.

Abstract

For the convolutional neural network (CNN) used for pattern classification, the training loss function is usually applied to the final output of the network, except for some regularization constraints on the network parameters. However, with the increasing of the number of network layers, the influence of the loss function on the network front layers gradually decreases, and the network parameters tend to fall into local optimization. At the same time, it is found that the trained network has significant information redundancy at all stages of features, which reduces the effectiveness of feature mapping at all stages and is not conducive to the change of the subsequent parameters of the network in the direction of optimality. Therefore, it is possible to obtain a more optimized solution of the network and further improve the classification accuracy of the network by designing a loss function for restraining the front stage features and eliminating the information redundancy of the front stage features .For CNN, this article proposes a multi-stage feature decorrelation loss (MFD Loss), which refines effective features and eliminates information redundancy by constraining the correlation of features at all stages. Considering that there are many layers in CNN, through experimental comparison and analysis, MFD Loss acts on multiple front layers of CNN, constrains the output features of each layer and each channel, and performs supervision training jointly with classification loss function during network training. Compared with the single Softmax Loss supervised learning, the experiments on several commonly used datasets on several typical CNNs prove that the classification performance of Softmax Loss+MFD Loss is significantly better. Meanwhile, the comparison experiments before and after the combination of MFD Loss and some other typical loss functions verify its good universality.

Multi-stage feature decorrelation constraints for improving CNN classification performance

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 6 figures, 7 tables)

This paper contains 14 sections, 7 equations, 6 figures, 7 tables.

Introduction
Related Work
Method
Correlation between Features
Pearson Correlation Coefficient
Multi-stage Feature Decorrelation Loss
Experiments and Results
Experimental Datasets
Experimental Results
Experimental Results on ResNet50
Experimental Results on Other CNNs
Effect of Combining with Loss Functions Dedicated to Face Recognition
Comparison of other performance indexes
Conclusion

Figures (6)

Figure 1: Visualization of some feature maps generated after the first convolution group in ResNet50, where similar feature maps are annotated with boxes of the same color.
Figure 2: MFD Loss and Softmax Loss in CNN.
Figure 3: ResNet50 network structure and the corresponding MFD loss function and classification loss function.
Figure 4: Visualization of some feature maps generated by Stage $0$ in ResNet50 after full training, where similar feature maps are annotated with boxes of the same color.
Figure 5: Visualization of some feature maps generated by Stage $1$ in ResNet50 after full training, where similar feature maps are annotated with boxes of the same color.
...and 1 more figures

Multi-stage feature decorrelation constraints for improving CNN classification performance

TL;DR

Abstract

Multi-stage feature decorrelation constraints for improving CNN classification performance

Authors

TL;DR

Abstract

Table of Contents

Figures (6)