Feature Map Similarity Reduction in Convolutional Neural Networks
Zakariae Belmekki, Jun Li, Patrick Reuter, David Antonio Gómez Jáuregui, Karl Jenkins
TL;DR
This work targets redundancy in CNN feature maps by showing that kernel orthogonality does not guarantee reduced feature-map similarity. It derives the Convolutional Similarity loss $L_{CS}$, formalizing a kernel-based objective that minimizes cross-kernel similarity to drive feature-map orthogonality, with the key relation $\langle F_1, F_2\rangle = \langle (K_1 \circledast K_2), (X \circledast X)_{[1-N, N-1]}\rangle$. Empirical results on shallow CNNs and a ResNet18 demonstrate that minimizing $L_{CS}$ improves accuracy, accelerates convergence, and enables much smaller models to achieve comparable performance. The approach provides a computationally efficient alternative to explicit feature-map decorrelation and kernel-only regularization, though it presents challenges when coupled with momentum-based optimizers. Future work will explore combining iterative initialization with momentum dynamics and extending CS to generative frameworks.
Abstract
It has been observed that Convolutional Neural Networks (CNNs) suffer from redundancy in feature maps, leading to inefficient capacity utilization. Efforts to address this issue have largely focused on kernel orthogonality method. In this work, we theoretically and empirically demonstrate that kernel orthogonality does not necessarily lead to a reduction in feature map redundancy. Based on this analysis, we propose the Convolutional Similarity method to reduce feature map similarity, independently of the CNN's input. The Convolutional Similarity can be minimized as either a regularization term or an iterative initialization method. Experimental results show that minimizing Convolutional Similarity not only improves classification accuracy but also accelerates convergence. Furthermore, our method enables the use of significantly smaller models to achieve the same level of performance, promoting a more efficient use of model capacity. Future work will focus on coupling the iterative initialization method with the optimization momentum term and examining the method's impact on generative frameworks.
