Full Bayesian Significance Testing for Neural Networks

Zehua Liu; Zimeng Li; Jingyuan Wang; Yue He

Full Bayesian Significance Testing for Neural Networks

Zehua Liu, Zimeng Li, Jingyuan Wang, Yue He

TL;DR

This work addresses the challenge of significance testing in nonlinear, high-dimensional settings by introducing nFBST, a Full Bayesian Significance Testing framework for neural networks. It leverages Bayesian neural networks to fit the data-generating process and derives the posterior of a testing statistic $\eta(\theta)$, using KDE and Monte Carlo to compute the Bayesian evidence $Ev(H_0)$ for $H_0: \eta(\theta)=0$, without requiring a fixed form for $f_0$. The framework supports global, local, and instance-wise significance through multiple testing statistics (Grad-, LRP-, DeepLIFT-, LIME-nFBST) and a quantile-based global test (Q-GS). Across toy, simulation, and real-data experiments, nFBST consistently identifies significant features and provides interpretable instance-wise insights, outperforming classical methods. This approach offers a general, extensible paradigm for rigorous significance testing in nonlinear neural models with practical implications for scientific inference and model interpretability.

Abstract

Significance testing aims to determine whether a proposition about the population distribution is the truth or not given observations. However, traditional significance testing often needs to derive the distribution of the testing statistic, failing to deal with complex nonlinear relationships. In this paper, we propose to conduct Full Bayesian Significance Testing for neural networks, called \textit{n}FBST, to overcome the limitation in relationship characterization of traditional approaches. A Bayesian neural network is utilized to fit the nonlinear and multi-dimensional relationships with small errors and avoid hard theoretical derivation by computing the evidence value. Besides, \textit{n}FBST can test not only global significance but also local and instance-wise significance, which previous testing methods don't focus on. Moreover, \textit{n}FBST is a general framework that can be extended based on the measures selected, such as Grad-\textit{n}FBST, LRP-\textit{n}FBST, DeepLIFT-\textit{n}FBST, LIME-\textit{n}FBST. A range of experiments on both simulated and real data are conducted to show the advantages of our method.

Full Bayesian Significance Testing for Neural Networks

TL;DR

, using KDE and Monte Carlo to compute the Bayesian evidence

for

, without requiring a fixed form for

. The framework supports global, local, and instance-wise significance through multiple testing statistics (Grad-, LRP-, DeepLIFT-, LIME-nFBST) and a quantile-based global test (Q-GS). Across toy, simulation, and real-data experiments, nFBST consistently identifies significant features and provides interpretable instance-wise insights, outperforming classical methods. This approach offers a general, extensible paradigm for rigorous significance testing in nonlinear neural models with practical implications for scientific inference and model interpretability.

Abstract

Paper Structure (18 sections, 18 equations, 5 figures, 2 tables)

This paper contains 18 sections, 18 equations, 5 figures, 2 tables.

Introduction
Theoretical Method
Classical Frequentist Significance Test
Full Bayesian Significance Test
Approximate the Distribution of Testing Statistics
Implementation Approach
Calculate the Distribution of Testing Statistics
Design of Testing Statistics
Experiments
Toy Example
Simulation Experiments
Data Generation Process
Test the Global Significance
Test the Instance-wise Significance
Real World Experiments
...and 3 more sections

Figures (5)

Figure 1: Bayesian evidence calculated based on the distribution of $\eta(\theta)$.
Figure 2: Bayesian evidence of $x_3$ obtained by Grad-nFBST under different intervals on the toy example.
Figure 3: Average AUC of all features for instance-wise significance before and after nFBST under different eps.
Figure 4: Histograms of gradient distributions for different values of $x_8$ on energy efficiency dataset.
Figure 5: Visualization of scores calculated by different methods for the target class.

Full Bayesian Significance Testing for Neural Networks

TL;DR

Abstract

Full Bayesian Significance Testing for Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)