Table of Contents
Fetching ...

Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms

Bernhard Klein

TL;DR

The work investigates resource-efficient and trustworthy neural inference by uniting algorithmic compression, robust training, compiler-driven deployment, and novel analog and photonic hardware. It introduces Galen for hardware-aware automatic compression (per-layer pruning/quantization guided by on-device latency and sensitivity), develops robust training strategies for analog accelerators (Walking Noise and VANT), and advances Bayesian inference on embedded systems via the Probabilistic Forward Pass (PFP) and ensemble methods (RLLE). It further explores analog and photonic hardware as native probabilistic substrates, demonstrating robust training and uncertainty estimation under realistic hardware conditions. The combined algorithm–hardware co-design yields practical pathways for deploying both deterministic and Bayesian neural networks on resource-constrained devices while maintaining calibrated uncertainty estimates and energy efficiency. These contributions collectively chart a roadmap for scalable, uncertainty-aware ML on next-generation hardware, with implications for on-device AI, safety-critical systems, and energy-conscious computing.

Abstract

While modern machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency, particularly on embedded and resource-limited platforms. In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data. Bayesian neural networks offer a principled framework for quantifying uncertainty, yet their computational overhead further compounds these challenges. This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks through the joint pursuit of algorithmic and hardware efficiency. The former reduces computation through model compression and approximate Bayesian inference, while the latter optimizes deployment on digital accelerators and explores analog hardware, bridging algorithmic design and physical realization. The first contribution, Galen, performs automatic layer-specific compression guided by sensitivity analysis and hardware-in-the-loop feedback. Analog accelerators offer efficiency gains at the cost of noise; this work models device imperfections and extends noisy training to nonstationary conditions, improving robustness and stability. A second line of work advances probabilistic inference, developing analytic and ensemble approximations that replace costly sampling, integrate into a compiler stack, and optimize embedded inference. Finally, probabilistic photonic computing introduces a paradigm where controlled analog noise acts as an intrinsic entropy source, enabling fast, energy-efficient probabilistic inference directly in hardware. Together, these studies demonstrate how efficiency and reliability can be advanced jointly through algorithm-hardware co-design, laying the foundation for the next generation of trustworthy, energy-efficient machine-learning systems.

Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms

TL;DR

The work investigates resource-efficient and trustworthy neural inference by uniting algorithmic compression, robust training, compiler-driven deployment, and novel analog and photonic hardware. It introduces Galen for hardware-aware automatic compression (per-layer pruning/quantization guided by on-device latency and sensitivity), develops robust training strategies for analog accelerators (Walking Noise and VANT), and advances Bayesian inference on embedded systems via the Probabilistic Forward Pass (PFP) and ensemble methods (RLLE). It further explores analog and photonic hardware as native probabilistic substrates, demonstrating robust training and uncertainty estimation under realistic hardware conditions. The combined algorithm–hardware co-design yields practical pathways for deploying both deterministic and Bayesian neural networks on resource-constrained devices while maintaining calibrated uncertainty estimates and energy efficiency. These contributions collectively chart a roadmap for scalable, uncertainty-aware ML on next-generation hardware, with implications for on-device AI, safety-critical systems, and energy-conscious computing.

Abstract

While modern machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency, particularly on embedded and resource-limited platforms. In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data. Bayesian neural networks offer a principled framework for quantifying uncertainty, yet their computational overhead further compounds these challenges. This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks through the joint pursuit of algorithmic and hardware efficiency. The former reduces computation through model compression and approximate Bayesian inference, while the latter optimizes deployment on digital accelerators and explores analog hardware, bridging algorithmic design and physical realization. The first contribution, Galen, performs automatic layer-specific compression guided by sensitivity analysis and hardware-in-the-loop feedback. Analog accelerators offer efficiency gains at the cost of noise; this work models device imperfections and extends noisy training to nonstationary conditions, improving robustness and stability. A second line of work advances probabilistic inference, developing analytic and ensemble approximations that replace costly sampling, integrate into a compiler stack, and optimize embedded inference. Finally, probabilistic photonic computing introduces a paradigm where controlled analog noise acts as an intrinsic entropy source, enabling fast, energy-efficient probabilistic inference directly in hardware. Together, these studies demonstrate how efficiency and reliability can be advanced jointly through algorithm-hardware co-design, laying the foundation for the next generation of trustworthy, energy-efficient machine-learning systems.

Paper Structure

This paper contains 172 sections, 46 equations, 66 figures, 19 tables.

Figures (66)

  • Figure 1: Thesis contributions organized along hardware technologies (vertical: digital vs. analog) and modeling paradigms (horizontal: deterministic vs. probabilistic).
  • Figure 2: Systematic perspective on resource-efficient machine learning. Representational efficiency, computational efficiency, and prediction quality form the triad that structures the discussion in this chapter. Reproduced with permission from roth2024jmlr.
  • Figure 3: Schematic illustration of quantization with the std. The forward pass applies quantization to weights and activations, while the backward pass approximates the gradient as the identity function. Reproduced with permission from roth2024jmlr.
  • Figure 4: Comparison of several popular quantization methods using the DenseNet-BC-100 architecture on the CIFAR-100 dataset. Test error is shown as a function of bit-width for weight and activation quantization. As expected, lower bit-widths lead to larger errors: weight-only quantization degrades accuracy moderately, activation quantization incurs larger losses, and quantizing both amplifies the effect. Reproduced with permission from roth2024jmlr.
  • Figure 5: Throughput–accuracy trade-offs of compressed models on CIFAR-10 across embedded hardware platforms: (a) quantized and pruned wrn WRN models on an arm cpu, (b) quantized vgg Simonyan15 models on an fpga data-flow architecture using FINN Umuroglu2017, and (c) different pruning methods on an embedded gpu. Reproduced with permission from roth2024jmlr.
  • ...and 61 more figures