Probabilistic computation and uncertainty quantification with emerging covariance

Hengyuan Ma; Yang Qi; Li Zhang; Wenlian Lu; Jianfeng Feng

Probabilistic computation and uncertainty quantification with emerging covariance

Hengyuan Ma, Yang Qi, Li Zhang, Wenlian Lu, Jianfeng Feng

TL;DR

This work introduces the moment neural network (MNN) framework to perform probabilistic computation by propagating only the first two cumulants (mean and covariance) through neural layers, enabling uncertainty quantification without explicit covariance supervision. Central to the approach is the SMUC training rule, which backpropagates only through the mean, letting covariance emerge from nonlinear mean–covariance coupling; theory ties SMUC to stochastic Riemannian gradient descent and establishes convergence and mean/variance learning. Empirically, MNNs accurately match the mean and covariance of stochastic networks and provide informative uncertainty measures (entropy) for in-distribution, out-of-distribution, and adversarial scenarios on MNIST/CIFAR-10, with mixed MNNs offering lower computational cost while preserving uncertainty signaling. The framework suggests a scalable, brain-inspired path to robust, uncertainty-aware AI systems, supported by explicit moment activations and variants for common neuron models, including ReLU, Heaviside, and LIF implementations.

Abstract

Building robust, interpretable, and secure AI system requires quantifying and representing uncertainty under a probabilistic perspective to mimic human cognitive abilities. However, probabilistic computation presents significant challenges for most conventional artificial neural network, as they are essentially implemented in a deterministic manner. In this paper, we develop an efficient probabilistic computation framework by truncating the probabilistic representation of neural activation up to its mean and covariance and construct a moment neural network that encapsulates the nonlinear coupling between the mean and covariance of the underlying stochastic network. We reveal that when only the mean but not the covariance is supervised during gradient-based learning, the unsupervised covariance spontaneously emerges from its nonlinear coupling with the mean and faithfully captures the uncertainty associated with model predictions. Our findings highlight the inherent simplicity of probabilistic computation by seamlessly incorporating uncertainty into model prediction, paving the way for integrating it into large-scale AI systems.

Probabilistic computation and uncertainty quantification with emerging covariance

TL;DR

Abstract

Paper Structure (29 sections, 3 theorems, 94 equations, 13 figures, 5 tables)

This paper contains 29 sections, 3 theorems, 94 equations, 13 figures, 5 tables.

Introduction
Probabilistic computation framework for supervised learning
Training stochastic networks with emerging covariance
Moment activations (MAs) and moment neural networks (MNNs)
Supervised mean unsupervised covariance (SMUC)
Theoretical analysis of the SMUC
Instantialization of the MNNs
Numerical verification
Emerging covariance faithfully captures uncertainty
In-distribution prediction uncertainty
Out-of-distribution prediction uncertainty
Comparison with other uncertainty quantification approaches
Further analysis
Mechanism of uncertainty representation by emerging covariance
Effect of the covariance
...and 14 more sections

Key Result

Theorem 1

If the learning rate schedule $\{\gamma_t\}$ satisfies a modified usual condition (Eq. eq:usu_cond, Supplementary Information), the parameter $\theta$ converges to a point at which the gradient diminishes almost surely.

Figures (13)

Figure 1: Schematic of probabilistic computation and learning with unsupervised covariance.a, Schematic diagram of a stochastic neural network (left) (Eq. \ref{['eq:stoc_network']}, Appendix) and the corresponding moment neural network (MNN, right) (Eq. \ref{['eq:mnn_feedforward']}). b, Comparison of conventional neural activation (top) and moment activations (MAs) (bottom). We illustrate the ReLU and ReLU MAs as examples. c, We train MNNs using the supervised mean unsupervised covariance (SMUC) scheme, where back-propagation only involves the mean while the covariances is treated as constants. d, Schematics of a mixed MNN consisting of shallow layers using conventional activations and deep layers using MAs. The rectangles represent the layers, and their heights indicate the respective layer dimensions.
Figure 2: Emerging covariance faithfully captures prediction uncertainty. A ReLU MNN and a mixed ReLU MNN are trained on MNIST and CIFAR-10 datasets for image classification respectively. a, Upper panel: the test accuracy during training. Shades indicate half standard deviation across trails; lower panel: the entropy for correctly classified (orange) and misclassified (green) inputs on the test set as learning progresses. Shades indicate half standard deviation across input samples. b, The entropy of the network state in the last two layers of the network for correctly classified and misclassified inputs. c, Separability of entropy between correctly classified and misclassified inputs increases with the layer index of the MNN. d, Separability and test accuracy during training are positively correlated. Dashed lines represent linear regression, with coefficients of determination found to be 0.6803 for MNIST and 0.6068 for CIFAR-10.
Figure 3: Emerging covariance of leaky integrate-and-fire (LIF) MNN faithfully captures the prediction uncertainty. LIF MNN trained on MNIST for image classification. a, The training of MNNs on MNIST, with the accuracy (red line) and entropy (orange and green for correctly classified inputs and misclassified inputs respectively) on the test set. b, The entropy of correctly classified and misclassified inputs in the last two layers of MNIST. The misclassified inputs result in relatively higher entropy in both layers. c, The entropy separability of correctly classified and misclassified inputs on MNIST increases as the layer index increases. d, The relationship between the separability and test accuracy during training on MNIST was analyzed using linear regression. The with coefficient of determination of 0.7760.
Figure 4: Out-of-distribution detection and adversarial attacks awareness in the MNN . a, Distribution of entropy, maximum softmax probability (MSP), and softmax entropy (SE) for correctly classified, misclassified, and out-of-distribution inputs. b, The separability of three indicators on the correctly classified, incorrectly classified and out-of-distribution (OoD) samples. The negative value of separability implies that the MSP and SE incorrectly indicate higher uncertainty for misclassified samples compared to out-of-distribution samples. c, The test accuracy of different models under various strengths $\epsilon$ of FGSM adversarial attacks. d, Box plot of the entropy of the model prediction for samples in the test set, for the MNN with $\sigma_1,\sigma_2>0$ under various strengths $\epsilon$ of FGSM attacks. Orange line: median; box: upper and lower quartile.
Figure 5: Mechanism of uncertainty representation by emerging covariance.a, Illustration of a binary classification dataset, where a sample in $\mathbb{R}^2$ belongs to class A (or class B) if the sign of its two coordinates $x_1, x_2$ are the same (or different). Solid lines indicate the class boundary of the dataset. b, Scatter plot of training samples randomly drawn from a Gaussian distribution. The misclassifed samples are typically located close to the decision boundary. The size of the bubbles indicates the magnitude of output entropy. c, Negative correlation is found between the distance to the decision boundary and the output entropy, with a coefficient of determination of 0.7196.
...and 8 more figures

Theorems & Definitions (6)

Theorem 1
Theorem 2
Theorem 3
proof
proof
proof

Probabilistic computation and uncertainty quantification with emerging covariance

TL;DR

Abstract

Probabilistic computation and uncertainty quantification with emerging covariance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (6)