Table of Contents
Fetching ...

Improving deep neural network performance through sampling

Lakshmi A. Ghantasala, Ming-Che Li, Risi Jaiswal, Behtash Behin-Aein, Joseph Makin, Shreyas Sen, Supriyo Datta

TL;DR

This work addresses the rising energy costs of generative AI by exploring sampling with probabilistic p-bits in deep networks. It introduces a universal energy framework based on a building-block energy $\epsilon_{EO}$ that factors weight memory reads, activation memory reads/writes, synapse computation, and neuron output, and shows that sampling multiple times ($T$) from a single weight read can amortize cost. The paper demonstrates two practical PDNN approaches: sample-aware training that uses $s$ samples to improve accuracy with minimal energy impact, and noise-based sampling that yields gains without retraining; both are validated on CIFAR10 and CelebA, with additional FPGA and 65nm ASIC analyses. Key findings include memory-energy dominance in DNN inference, substantial energy savings from 1-bit activations with multiple samples, and the ability to match higher-bit quantization performance at lower energy by tuning $T$. The results imply that energy-efficient, sampling-based DNNs could enable scalable, adaptable AI systems, including potential benefits for large language models, when integrated with in-memory compute and run-time sampling strategies.

Abstract

Energy efficient sampling with probabilistic neurons or p-bits has been demonstrated in the context of Boltzmann machines and it is natural to ask if these approaches can be extended to the field of generative AI where energy costs have become prohibitively large. However, this very active field is dominated by feedforward deep neural networks (DNNs) which primarily use multi-bit deterministic neurons with no role for sampling. In this paper we first show that it is feasible to obtain superior accuracy through the use of multiple samples generated by probabilistic networks. This possibility raises the question of which option is energetically preferable for improving accuracy: generating more samples, or adding more bits to a single deterministic sample. We provide a simple expression that can be used to estimate these energy tradeoffs and illustrate it with results for different algorithms and architectures.

Improving deep neural network performance through sampling

TL;DR

This work addresses the rising energy costs of generative AI by exploring sampling with probabilistic p-bits in deep networks. It introduces a universal energy framework based on a building-block energy that factors weight memory reads, activation memory reads/writes, synapse computation, and neuron output, and shows that sampling multiple times () from a single weight read can amortize cost. The paper demonstrates two practical PDNN approaches: sample-aware training that uses samples to improve accuracy with minimal energy impact, and noise-based sampling that yields gains without retraining; both are validated on CIFAR10 and CelebA, with additional FPGA and 65nm ASIC analyses. Key findings include memory-energy dominance in DNN inference, substantial energy savings from 1-bit activations with multiple samples, and the ability to match higher-bit quantization performance at lower energy by tuning . The results imply that energy-efficient, sampling-based DNNs could enable scalable, adaptable AI systems, including potential benefits for large language models, when integrated with in-memory compute and run-time sampling strategies.

Abstract

Energy efficient sampling with probabilistic neurons or p-bits has been demonstrated in the context of Boltzmann machines and it is natural to ask if these approaches can be extended to the field of generative AI where energy costs have become prohibitively large. However, this very active field is dominated by feedforward deep neural networks (DNNs) which primarily use multi-bit deterministic neurons with no role for sampling. In this paper we first show that it is feasible to obtain superior accuracy through the use of multiple samples generated by probabilistic networks. This possibility raises the question of which option is energetically preferable for improving accuracy: generating more samples, or adding more bits to a single deterministic sample. We provide a simple expression that can be used to estimate these energy tradeoffs and illustrate it with results for different algorithms and architectures.

Paper Structure

This paper contains 23 sections, 9 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: (a) Building block of p-circuits, (b) System architecture used to implement the building block, based on which the energy $\varepsilon$ per elementary operation (Eq.eq:1) is written.
  • Figure 2: (a) Standard DNNs are implemented with activation functions like tanh or ReLU, and (b) are generally trained following backpropagation. PDNNs can come in different forms, broadly referring to DNNs with stochasticity infused into each forward pass. (c) shows a model that integrates stochasticity via noisy activations, while (d) shows a model that replaces the activations with pbits. (e) shows the sample aware training scheme that dramatically improves performance for pDNNs. (f) specifies the activation functions.
  • Figure 3: Images generated via a traditional variational autoencoder compared with various probabilistic alternatives of the same architecture. (a) standard DNN inference. (b) model with pbits replacing all activations. (c) model that is retrained with pbit activations. (d) model that is retrained with pbit activations and an analog last layer. (e) proposed sample aware training scheme employed on (d).
  • Figure 4: (a) Comparison of accuracy for CIFAR10 classification. (b) Comparison of FID metrics for variational autoencoder trained on the celeba dataset. (c) Accuracy on CIFAR10 of a deterministic baseline compared with model with noisy activations.
  • Figure 5: Energy cost of component operations of building block for QMC from 65 nm ASIC implementation reported in Ref.li_122_2025.
  • ...and 6 more figures