Table of Contents
Fetching ...

Similarity of Neural Architectures using Adversarial Attack Transferability

Jaehui Hwang, Dongyoon Han, Byeongho Heo, Song Park, Sanghyuk Chun, Jong-Seok Lee

TL;DR

This work tackles the problem of quantifying neural-architecture similarity beyond predictive accuracy. It introduces SAT, a quantitative metric based on adversarial attack transferability, formalized as $SAT(A,B)=\log[\max\{\varepsilon_s, 100 \times \frac{1}{2|X_{AB}|}\sum_{x \in X_{AB}} (\mathbb{I}(A(x_B)\neq y) + \mathbb{I}(B(x_A)\neq y))\}]$, to compare how similarly two architectures respond to adversarial perturbations. Through a large-scale study of 69 ImageNet models, the authors demonstrate that macroscopic architectural choices—especially base architecture and stem design—primarily drive SAT, with 13 architecture components enabling feature-based analyses and spectral clustering revealing coherent model groups. They further show SAT-guided model diversity improves ensemble performance and distillation outcomes, and that the method generalizes across datasets (e.g., Flowers-102). Overall, SAT provides a scalable, architecture-agnostic lens for designing diverse, robust model portfolios for practical applications.

Abstract

In recent years, many deep neural architectures have been developed for image classification. Whether they are similar or dissimilar and what factors contribute to their (dis)similarities remains curious. To address this question, we aim to design a quantitative and scalable similarity measure between neural architectures. We propose Similarity by Attack Transferability (SAT) from the observation that adversarial attack transferability contains information related to input gradients and decision boundaries widely used to understand model behaviors. We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers using our proposed similarity function to answer the question. Moreover, we observe neural architecture-related phenomena using model similarity that model diversity can lead to better performance on model ensembles and knowledge distillation under specific conditions. Our results provide insights into why developing diverse neural architectures with distinct components is necessary.

Similarity of Neural Architectures using Adversarial Attack Transferability

TL;DR

This work tackles the problem of quantifying neural-architecture similarity beyond predictive accuracy. It introduces SAT, a quantitative metric based on adversarial attack transferability, formalized as , to compare how similarly two architectures respond to adversarial perturbations. Through a large-scale study of 69 ImageNet models, the authors demonstrate that macroscopic architectural choices—especially base architecture and stem design—primarily drive SAT, with 13 architecture components enabling feature-based analyses and spectral clustering revealing coherent model groups. They further show SAT-guided model diversity improves ensemble performance and distillation outcomes, and that the method generalizes across datasets (e.g., Flowers-102). Overall, SAT provides a scalable, architecture-agnostic lens for designing diverse, robust model portfolios for practical applications.

Abstract

In recent years, many deep neural architectures have been developed for image classification. Whether they are similar or dissimilar and what factors contribute to their (dis)similarities remains curious. To address this question, we aim to design a quantitative and scalable similarity measure between neural architectures. We propose Similarity by Attack Transferability (SAT) from the observation that adversarial attack transferability contains information related to input gradients and decision boundaries widely used to understand model behaviors. We conduct a large-scale analysis on 69 state-of-the-art ImageNet classifiers using our proposed similarity function to answer the question. Moreover, we observe neural architecture-related phenomena using model similarity that model diversity can lead to better performance on model ensembles and knowledge distillation under specific conditions. Our results provide insights into why developing diverse neural architectures with distinct components is necessary.
Paper Structure (43 sections, 1 equation, 17 figures, 11 tables)

This paper contains 43 sections, 1 equation, 17 figures, 11 tables.

Figures (17)

  • Figure 1: t-SNE plot showing 10 clusters of 69 neural networks using our similarity function, SAT.
  • Figure 2: How SAT works? Conceptual figure to understand SAT by the lens of the decision boundary. Each line denotes the decision boundary of a binary classification model, and each dot denotes individual prediction for given inputs.
  • Figure 2: SAT within the same architecture. We compare the average similarity within the same architecture but trained with different procedures, "All" denotes the average similarity of 69 architectures.
  • Figure 3: Importance of architectural components to network similarity. 13 components are sorted by the contribution to the similarities. The larger feature importance means the component contributes more to the network similarity.
  • Figure 4: Pairwise distances of spectral features. Rows and columns are sorted by the clustering index. More details are described in \ref{['subsec:appendix-spectral-clustering']}.
  • ...and 12 more figures