Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms
Bernhard Klein
TL;DR
The work investigates resource-efficient and trustworthy neural inference by uniting algorithmic compression, robust training, compiler-driven deployment, and novel analog and photonic hardware. It introduces Galen for hardware-aware automatic compression (per-layer pruning/quantization guided by on-device latency and sensitivity), develops robust training strategies for analog accelerators (Walking Noise and VANT), and advances Bayesian inference on embedded systems via the Probabilistic Forward Pass (PFP) and ensemble methods (RLLE). It further explores analog and photonic hardware as native probabilistic substrates, demonstrating robust training and uncertainty estimation under realistic hardware conditions. The combined algorithm–hardware co-design yields practical pathways for deploying both deterministic and Bayesian neural networks on resource-constrained devices while maintaining calibrated uncertainty estimates and energy efficiency. These contributions collectively chart a roadmap for scalable, uncertainty-aware ML on next-generation hardware, with implications for on-device AI, safety-critical systems, and energy-conscious computing.
Abstract
While modern machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency, particularly on embedded and resource-limited platforms. In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data. Bayesian neural networks offer a principled framework for quantifying uncertainty, yet their computational overhead further compounds these challenges. This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks through the joint pursuit of algorithmic and hardware efficiency. The former reduces computation through model compression and approximate Bayesian inference, while the latter optimizes deployment on digital accelerators and explores analog hardware, bridging algorithmic design and physical realization. The first contribution, Galen, performs automatic layer-specific compression guided by sensitivity analysis and hardware-in-the-loop feedback. Analog accelerators offer efficiency gains at the cost of noise; this work models device imperfections and extends noisy training to nonstationary conditions, improving robustness and stability. A second line of work advances probabilistic inference, developing analytic and ensemble approximations that replace costly sampling, integrate into a compiler stack, and optimize embedded inference. Finally, probabilistic photonic computing introduces a paradigm where controlled analog noise acts as an intrinsic entropy source, enabling fast, energy-efficient probabilistic inference directly in hardware. Together, these studies demonstrate how efficiency and reliability can be advanced jointly through algorithm-hardware co-design, laying the foundation for the next generation of trustworthy, energy-efficient machine-learning systems.
