Table of Contents
Fetching ...

Axiomatization of Gradient Smoothing in Neural Networks

Linjiang Zhou, Xiaochuan Shi, Chao Ma, Zepeng Wang

TL;DR

This work addresses the lack of theoretical grounding for gradient smoothing in neural networks by formulating Monte Carlo Gradient Mollification, anchored in function mollification and Monte Carlo integration. It shows that SmoothGrad, NoiseGrad, and FusionGrad are special cases within a unified axiomatic framework and introduces new kernel-based smoothing methods, guided by convergence and hyperparameter analysis. The authors provide mathematical proofs of unbiasedness and consistency for the Monte Carlo estimator and offer practical guidance on kernel design and parameter settings. Experimental results across multiple datasets and models demonstrate how kernel choice and smoothing mode affect explainability metrics, underscoring the framework's potential to improve gradient-based explanations with principled methods.

Abstract

Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we proposed a gradient smooth theoretical framework for neural networks based on the function mollification and Monte Carlo integration. The framework intrinsically axiomatized gradient smoothing and reveals the rationale of existing methods. Furthermore, we provided an approach to design new smooth methods derived from the framework. By experimental measurement of several newly designed smooth methods, we demonstrated the research potential of our framework.

Axiomatization of Gradient Smoothing in Neural Networks

TL;DR

This work addresses the lack of theoretical grounding for gradient smoothing in neural networks by formulating Monte Carlo Gradient Mollification, anchored in function mollification and Monte Carlo integration. It shows that SmoothGrad, NoiseGrad, and FusionGrad are special cases within a unified axiomatic framework and introduces new kernel-based smoothing methods, guided by convergence and hyperparameter analysis. The authors provide mathematical proofs of unbiasedness and consistency for the Monte Carlo estimator and offer practical guidance on kernel design and parameter settings. Experimental results across multiple datasets and models demonstrate how kernel choice and smoothing mode affect explainability metrics, underscoring the framework's potential to improve gradient-based explanations with principled methods.

Abstract

Gradients play a pivotal role in neural networks explanation. The inherent high dimensionality and structural complexity of neural networks result in the original gradients containing a significant amount of noise. While several approaches were proposed to reduce noise with smoothing, there is little discussion of the rationale behind smoothing gradients in neural networks. In this work, we proposed a gradient smooth theoretical framework for neural networks based on the function mollification and Monte Carlo integration. The framework intrinsically axiomatized gradient smoothing and reveals the rationale of existing methods. Furthermore, we provided an approach to design new smooth methods derived from the framework. By experimental measurement of several newly designed smooth methods, we demonstrated the research potential of our framework.
Paper Structure (24 sections, 6 theorems, 52 equations, 8 figures, 4 tables)

This paper contains 24 sections, 6 theorems, 52 equations, 8 figures, 4 tables.

Key Result

Lemma 3.2

If $f*g$ exists,

Figures (8)

  • Figure 1: An example of noisy original gradient in image classification model VGG16.
  • Figure 2: An example of mollification. A simple piecewise function $f(x):\mathbb{R}\to\mathbb{R}$ has a noncontinuous and noisy bound. And it was mollified by Gaussian kernel with different $\epsilon$. The details of the example can be found in \ref{['appendx:3']}
  • Figure 3: Visualization of Gaussian, Poisson, Hyperbolic, Sigmoid, Rect kernel with $\epsilon=1$
  • Figure 4: Visual explanation for random image labeled as dog using SmoothGrad with Gaussian kernel and Sigmoid kernel. Visualization performance improves with varying numbers of samples, maintaining a convincing quality when reaching $N=50$.
  • Figure 5: Visual explanation random image labeled as cat using SmoothGrad with Gaussian kernel and Hyperbolic kernel. Visualization performance ranges with different $\alpha$, and when $\alpha=0.9$, the gradient map could present a decent performance, while when $\alpha=0.95$ it instead loses some details of the object edge.
  • ...and 3 more figures

Theorems & Definitions (21)

  • Definition 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Definition 3.5
  • Lemma 3.6
  • Lemma 3.7
  • Definition 3.8
  • Definition 3.9
  • Definition 3.10
  • ...and 11 more