Table of Contents
Fetching ...

Pretraining with random noise for uncertainty calibration

Jeonghwan Cheon, Se-Bum Paik

TL;DR

The paper addresses uncertainty calibration in deep neural networks, where models are overconfident and unreliable on unseen data. It proposes a simple, biologically inspired approach: pretrain networks with random noise and random labels to pre-calibrate their uncertainty before standard data training. The results show that this random noise pretraining reduces overconfidence, aligns predicted confidence with actual accuracy, and improves out-of-distribution detection without extra processing. The work suggests a universal initialization strategy with practical implications for safer, more robust AI systems and offers insights into prenatal learning processes.

Abstract

Uncertainty calibration is crucial for various machine learning applications, yet it remains challenging. Many models exhibit hallucinations - confident yet inaccurate responses - due to miscalibrated confidence. Here, we show that the common practice of random initialization in deep learning, often considered a standard technique, is an underlying cause of this miscalibration, leading to excessively high confidence in untrained networks. Our method, inspired by developmental neuroscience, addresses this issue by simply pretraining networks with random noise and labels, reducing overconfidence and bringing initial confidence levels closer to chance. This ensures optimal calibration, aligning confidence with accuracy during subsequent data training, without the need for additional pre- or post-processing. Pre-calibrated networks excel at identifying "unknown data," showing low confidence for out-of-distribution inputs, thereby resolving confidence miscalibration.

Pretraining with random noise for uncertainty calibration

TL;DR

The paper addresses uncertainty calibration in deep neural networks, where models are overconfident and unreliable on unseen data. It proposes a simple, biologically inspired approach: pretrain networks with random noise and random labels to pre-calibrate their uncertainty before standard data training. The results show that this random noise pretraining reduces overconfidence, aligns predicted confidence with actual accuracy, and improves out-of-distribution detection without extra processing. The work suggests a universal initialization strategy with practical implications for safer, more robust AI systems and offers insights into prenatal learning processes.

Abstract

Uncertainty calibration is crucial for various machine learning applications, yet it remains challenging. Many models exhibit hallucinations - confident yet inaccurate responses - due to miscalibrated confidence. Here, we show that the common practice of random initialization in deep learning, often considered a standard technique, is an underlying cause of this miscalibration, leading to excessively high confidence in untrained networks. Our method, inspired by developmental neuroscience, addresses this issue by simply pretraining networks with random noise and labels, reducing overconfidence and bringing initial confidence levels closer to chance. This ensures optimal calibration, aligning confidence with accuracy during subsequent data training, without the need for additional pre- or post-processing. Pre-calibrated networks excel at identifying "unknown data," showing low confidence for out-of-distribution inputs, thereby resolving confidence miscalibration.

Paper Structure

This paper contains 15 sections, 4 equations, 11 figures.

Figures (11)

  • Figure 1: Confidence miscalibration in artificial neural networks. (a) The illustration depicts how self-driving cars detect objects in their environment and make decisions based on these detections. Both the calibrated and miscalibrated models predict the same label, but the confidence levels differ significantly between the two models. Miscalibrated confidence can lead to incorrect decisions, even when the model’s accuracy is high. (b) The predicted answer and its confidence are calculated using the probability output from the SoftMax function. (c) Reliability diagram for a six-layer feedforward neural network trained on a subset of the CIFAR-10 dataset (training data size = 4,000). The test dataset's predictive confidence and correctness are binned based on confidence values, and the accuracy is calculated in each bin. The diagonal line represents ideal calibration, where confidence perfectly matches the expected accuracy. The Expected Calibration Error (ECE) is the difference between predicted confidence and actual accuracy. (d) Calibration error across various model complexities and training data sizes. Each color represents the ECE: blue and red indicate low and high calibration error, respectively.
  • Figure 1: Measuring confidence calibration of a model network through a reliability diagram. (a) Sample images were presented to a trained neural network, and its predictions were recorded ($n_{\text{trial}} = 10000$). Specifically, confidence and predicted answers were measured in each trial. (b) Confidence histogram. Each trial is binned according to its confidence level. (c) In each batch of predictions binned by confidence, the correct ratios were measured. The diagram shows two example batches with prediction trials at different confidence levels. In an ideally calibrated model, the correct ratios in each batch should match the confidence levels. (d) Reliability diagram. This diagram shows the model’s accuracy as a function of confidence levels for each batch. In an ideally calibrated model (gray bars), accuracy should align with confidence levels across all predictions, represented by a diagonal line.
  • Figure 2: Pretraining with random noise enables confidence calibration in neural networks. (a) Prenatal learning through spontaneous neural activity prior to sensory input in a fetal rat, before birth (left) and after eye opening (right) galli1988. (b) Spontaneous neural activity in the developing visual and auditory areas martini2021. (c) Schematic of the pretraining algorithm with random noise, inspired by the developing brain before sensory experience. During pretraining, the network is trained on randomly sampled inputs from a Gaussian distribution and unpaired labels from a uniform distribution. (d) Test loss during random noise pretraining and data training. (Inset) Accuracy during data training in networks with (blue) and without (orange) pretraining. (e) Histogram of confidence for a network trained only with data (orange) and a network pretrained with random noise (blue). Averaged accuracy and confidence are indicated by vertical lines. (Right) The difference between averaged confidence and accuracy of predictions. (f) Reliability diagram showing the expected accuracy for samples binned by confidence. (Inset) Expected Calibration Error. (g) The effect of random noise pretraining under varying conditions of training data size and network complexity on calibration error. The marker indicates the parameters used in (d-f), where the six-layer networks are trained with a dataset size of 4,000.
  • Figure 2: Learning curve in random noise pretraining and subsequent data training stage. (a) The pretraining process, where neural networks are trained on random noise before encountering real data. (b) After pretraining with random noise, the network is subsequently trained with real data. (c-d) Loss curves during random noise pretraining: (c) Training loss. (d) Test loss. (e-f) Loss curve during subsequent data training: (e) Training loss. (f) Test loss. (g-h) Accuracy curves during random noise pretraining: (g) Training accuracy. (h) Test accuracy. Note that the accuracy remains at chance level, as there is no explicit correlation between the input (random noise) and the output labels. (i-j) Accuracy curve during subsequent data training: (i) Training accuracy. (j) Test accuracy. In each learning curve, the orange line represents the network trained solely with data (without random noise pretraining), while the blue line represents the network pretrained with random noise.
  • Figure 3: Random noise pre-calibrates neural network uncertainty over input space. (a) Diagram illustrating the training of random noise in a toy-model network with a two-dimensional input space and binary output space. The network receives random noise with two input features and randomly assigned binary labels, and is trained using Binary Cross-Entropy (BCE) loss. (b) Visualization of confidence over the input space. (Left) Confidence map of the untrained network. (Right) Confidence map after random noise pretraining. (c) Confidence distribution of the network for two-dimensional noise. (d) Class bias of the network for two-dimensional noise, showing the extent of bias in class prediction. (e) Diagram of training random noise in a multi-layer perceptron with a high-dimensional input space, designed to classify the CIFAR-10 dataset. The network receives 32×32×3 input images and outputs SoftMax probabilities for ten classes. (f) Visualization of SoftMax output probabilities for random noise inputs in the readout neurons. (Left) Untrained network. (Right) Network after random noise pretraining. The dashed line represents the chance level of classification. (g) Confidence distribution of the network. (h) Class bias of the network, showing the bias towards specific classes.
  • ...and 6 more figures