Three Quantization Regimes for ReLU Networks
Weigutian Ou, Philipp Schenkel, Helmut Bölcskei
TL;DR
This work establishes nonasymptotic minimax limits for approximating Lipschitz functions on [0,1] by deep ReLU networks with finite-precision weights. It identifies three quantization regimes—under-, proper-, and over-quantization—demonstrating exponential, polynomial, and constant error regimes respectively, and proves memory-optimality in the proper-quantization regime. The authors develop a constructive upper bound using unquantized approximants, then quantize with a refined bit-extraction technique to achieve memory-optimal performance and a depth-precision tradeoff that converts high-precision networks into deeper low-precision equivalents while preserving accuracy. They also derive complementary lower bounds via memory requirements, VC-dimension, and numerical precision, establishing a tight three-regime characterization and guiding design under fixed memory budgets. Collectively, the results advance the theory of ReLU network approximation under finite precision and offer practical insights for hardware-aware neural network quantization and depth-width tradeoffs.
Abstract
We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds on the minimax approximation error. Notably, in the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions. Deep networks have an inherent advantage over shallow networks in achieving memory-optimality. We also develop the notion of depth-precision tradeoff, showing that networks with high-precision weights can be converted into functionally equivalent deeper networks with low-precision weights, while preserving memory-optimality. This idea is reminiscent of sigma-delta analog-to-digital conversion, where oversampling rate is traded for resolution in the quantization of signal samples. We improve upon the best-known ReLU network approximation results for Lipschitz functions and describe a refinement of the bit extraction technique which could be of independent general interest.
