Reliability of CKA as a Similarity Measure in Deep Learning
MohammadReza Davari, Stefan Horoi, Amine Natik, Guillaume Lajoie, Guy Wolf, Eugene Belilovsky
TL;DR
<3-5 sentence high-level summary> The paper investigates the reliability of Centered Kernel Alignment (CKA), particularly its linear variant, as a representation similarity measure in deep learning. It provides a theoretical characterization showing that CKA_lin can be highly sensitive to translations of a subset of representations, including outliers and directions of high variance, and can even decrease despite preserving linear separability. Empirically, the work demonstrates that CKA values and maps can be manipulated with minimal impact on network functionality, prompting manipulation of CKA maps and challenging prior conclusions drawn from CKA heatmaps. The authors advocate using multiple similarity metrics and visual analyses to interpret neural representations, especially when comparing diverse architectures or open pretrained models.
Abstract
Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has recently become a popular approach and has been widely used to compare representations of a network's different layers, of architecturally similar networks trained differently, or of models with different architectures trained on the same data. A wide variety of conclusions about similarity and dissimilarity of these various representations have been made using CKA. In this work we present analysis that formally characterizes CKA sensitivity to a large class of simple transformations, which can naturally occur in the context of modern machine learning. This provides a concrete explanation of CKA sensitivity to outliers, which has been observed in past works, and to transformations that preserve the linear separability of the data, an important generalization attribute. We empirically investigate several weaknesses of the CKA similarity metric, demonstrating situations in which it gives unexpected or counter-intuitive results. Finally we study approaches for modifying representations to maintain functional behaviour while changing the CKA value. Our results illustrate that, in many cases, the CKA value can be easily manipulated without substantial changes to the functional behaviour of the models, and call for caution when leveraging activation alignment metrics.
