Table of Contents
Fetching ...

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

Xin-Chun Li, Jin-Lin Tang, Bo Zhang, Lan Li, De-Chuan Zhan

TL;DR

This study methodically explores the factors affecting the symmetry of DNN valleys, encompassing the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and the magnitude and direction of the noise for 1D visualization, showing that the degree of sign consistency between the noise and the convergence point is a critical indicator of valley symmetry.

Abstract

Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.

Exploring and Exploiting the Asymmetric Valley of Deep Neural Networks

TL;DR

This study methodically explores the factors affecting the symmetry of DNN valleys, encompassing the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and the magnitude and direction of the noise for 1D visualization, showing that the degree of sign consistency between the noise and the convergence point is a critical indicator of valley symmetry.

Abstract

Exploring the loss landscape offers insights into the inherent principles of deep neural networks (DNNs). Recent work suggests an additional asymmetry of the valley beyond the flat and sharp ones, yet without thoroughly examining its causes or implications. Our study methodically explores the factors affecting the symmetry of DNN valleys, encompassing (1) the dataset, network architecture, initialization, and hyperparameters that influence the convergence point; and (2) the magnitude and direction of the noise for 1D visualization. Our major observation shows that the {\it degree of sign consistency} between the noise and the convergence point is a critical indicator of valley symmetry. Theoretical insights from the aspects of ReLU activation and softmax function could explain the interesting phenomenon. Our discovery propels novel understanding and applications in the scenario of Model Fusion: (1) the efficacy of interpolating separate models significantly correlates with their sign consistency ratio, and (2) imposing sign alignment during federated learning emerges as an innovative approach for model parameter alignment.
Paper Structure (33 sections, 2 equations, 29 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 2 equations, 29 figures, 1 table, 1 algorithm.

Figures (29)

  • Figure 1: The illustration of investigated factors that could affect the valley symmetry. The $\epsilon$ matters a lot.
  • Figure 2: The illustration of different visualization methods for 1D visualization. The norm-scaled noise unifies the magnitude of various noise without changing their directions.
  • Figure 3: The valleys under 7 common noise types. The second row shows the results of replacing the sign of noise with that of $\theta_f$, leading to asymmetric valleys. (VGG16 with BN on CIFAR10)
  • Figure 4: The impacts of manually constructed Gaussian noise with different levels of sign consistency.
  • Figure 5: The valley shape under 6 special noise types. (VGG16 with BN on CIFAR10)
  • ...and 24 more figures