Adaptive Inference: Theoretical Limits and Unexplored Opportunities
Soheil Hor, Ying Qian, Mert Pilanci, Amin Arbabian
TL;DR
This work formalizes adaptive inference by modeling inference pipelines as adaptation state machines and introducing Adaptive Oracles to bound efficiency and accuracy. It provides both exact and constant-$\alpha$ approximate bounds, and validates them with ImageNet and HellaSwag experiments showing substantial potential for efficiency gains (up to tens to hundreds of times) without sacrificing performance. The results offer practical design guidelines for selecting state spaces (range, size, granularity) to maximize adaptation potential, and reveal near-optimal gains even with small state spaces. The framework lays a foundation for systematic, quantifiable design of adaptive inference in CV and NLP, with implications for edge deployment and cloud-scale efficiency.
Abstract
This paper introduces the first theoretical framework for quantifying the efficiency and performance gain opportunity size of adaptive inference algorithms. We provide new approximate and exact bounds for the achievable efficiency and performance gains, supported by empirical evidence demonstrating the potential for 10-100x efficiency improvements in both Computer Vision and Natural Language Processing tasks without incurring any performance penalties. Additionally, we offer insights on improving achievable efficiency gains through the optimal selection and design of adaptive inference state spaces.
