Table of Contents
Fetching ...

Slumbering to Precision: Enhancing Artificial Neural Network Calibration Through Sleep-like Processes

Jean Erik Delanois, Aditya Ahuja, Giri P. Krishnan, Maxim Bazhenov

TL;DR

Sleep Replay Consolidation is a post-training, sleep-like phase that selectively replays internal representations to update network weights and improve calibration without supervised retraining, and is competitive with and complementary to standard approaches such as temperature scaling.

Abstract

Artificial neural networks are often overconfident, undermining trust because their predicted probabilities do not match actual accuracy. Inspired by biological sleep and the role of spontaneous replay in memory and learning, we introduce Sleep Replay Consolidation (SRC), a novel calibration approach. SRC is a post-training, sleep-like phase that selectively replays internal representations to update network weights and improve calibration without supervised retraining. Across multiple experiments, SRC is competitive with and complementary to standard approaches such as temperature scaling. Combining SRC with temperature scaling achieves the best Brier score and entropy trade-offs for AlexNet and VGG19. These results show that SRC provides a fundamentally novel approach to improving neural network calibration. SRC-based calibration offers a practical path toward more trustworthy confidence estimates and narrows the gap between human-like uncertainty handling and modern deep networks.

Slumbering to Precision: Enhancing Artificial Neural Network Calibration Through Sleep-like Processes

TL;DR

Sleep Replay Consolidation is a post-training, sleep-like phase that selectively replays internal representations to update network weights and improve calibration without supervised retraining, and is competitive with and complementary to standard approaches such as temperature scaling.

Abstract

Artificial neural networks are often overconfident, undermining trust because their predicted probabilities do not match actual accuracy. Inspired by biological sleep and the role of spontaneous replay in memory and learning, we introduce Sleep Replay Consolidation (SRC), a novel calibration approach. SRC is a post-training, sleep-like phase that selectively replays internal representations to update network weights and improve calibration without supervised retraining. Across multiple experiments, SRC is competitive with and complementary to standard approaches such as temperature scaling. Combining SRC with temperature scaling achieves the best Brier score and entropy trade-offs for AlexNet and VGG19. These results show that SRC provides a fundamentally novel approach to improving neural network calibration. SRC-based calibration offers a practical path toward more trustworthy confidence estimates and narrows the gap between human-like uncertainty handling and modern deep networks.
Paper Structure (31 sections, 4 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 4 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Reliability diagrams showing improvement in ECE metric as a result of SRC (Right) after initial Baseline training (Left) for the ResNet-152 model trained on CIFAR-100. Confidence more accurately reflects accuracy (blue bins closer to diagonal). Red bars indicate difference between ideal and actual accuracy. Test confidence and accuracy are shown.
  • Figure 2: Two-dimensional histograms of ResNet-152 confidences on CIFAR-100 baseline confidence (horizontal) vs method confidence (vertical); color encodes sample density and the red diagonal marks no change. SRC (left) increases or decreases confidences; TS (right) only maintains or reduces them.
  • Figure 3: Histograms of FF feature magnitudes for ResNet-152 on CIFAR-100: Baseline/TS (blue), SRC (orange), and LS (green). The Baseline model shows the widest distribution (0–8), LS is more constrained (0–4.5), and SRC shifts the Baseline distribution to a lower maximum (5.5), closely matching LS. This suggests SRC aligns feature representations with the well-calibrated LS model without full retraining.
  • Figure 4: Distributions of nonzero FF layer elements over test samples in ResNet-152 on CIFAR-100 for Baseline/TS (blue), SRC (orange), and LS (green). Baseline representations are densest (82–90% nonzero) and LS is moderately sparser with partial overlap. SRC achieves substantially sparser representations (67–81%).
  • Figure 5: Weight changes from SRC for ResNet-152 on CIFAR-100. Most weights decreased, leading to smaller feature magnitudes (similar to LS) and sparser representations.