Improving deep neural network performance through sampling
Lakshmi A. Ghantasala, Ming-Che Li, Risi Jaiswal, Behtash Behin-Aein, Joseph Makin, Shreyas Sen, Supriyo Datta
TL;DR
This work addresses the rising energy costs of generative AI by exploring sampling with probabilistic p-bits in deep networks. It introduces a universal energy framework based on a building-block energy $\epsilon_{EO}$ that factors weight memory reads, activation memory reads/writes, synapse computation, and neuron output, and shows that sampling multiple times ($T$) from a single weight read can amortize cost. The paper demonstrates two practical PDNN approaches: sample-aware training that uses $s$ samples to improve accuracy with minimal energy impact, and noise-based sampling that yields gains without retraining; both are validated on CIFAR10 and CelebA, with additional FPGA and 65nm ASIC analyses. Key findings include memory-energy dominance in DNN inference, substantial energy savings from 1-bit activations with multiple samples, and the ability to match higher-bit quantization performance at lower energy by tuning $T$. The results imply that energy-efficient, sampling-based DNNs could enable scalable, adaptable AI systems, including potential benefits for large language models, when integrated with in-memory compute and run-time sampling strategies.
Abstract
Energy efficient sampling with probabilistic neurons or p-bits has been demonstrated in the context of Boltzmann machines and it is natural to ask if these approaches can be extended to the field of generative AI where energy costs have become prohibitively large. However, this very active field is dominated by feedforward deep neural networks (DNNs) which primarily use multi-bit deterministic neurons with no role for sampling. In this paper we first show that it is feasible to obtain superior accuracy through the use of multiple samples generated by probabilistic networks. This possibility raises the question of which option is energetically preferable for improving accuracy: generating more samples, or adding more bits to a single deterministic sample. We provide a simple expression that can be used to estimate these energy tradeoffs and illustrate it with results for different algorithms and architectures.
