On Expressive Power of Quantized Neural Networks under Fixed-Point Arithmetic
Yeachan Park, Sejun Park, Geonho Hwang
TL;DR
This work studies the expressive power of quantized neural networks operating with fixed-point arithmetic. It provides a necessary condition and a sufficient condition for universal representation, showing that rounded versions of many practical activations (e.g., Sigmoid, ReLU, ELU, SoftPlus, SiLU, Mish, GELU) enable universal representation under appropriate fixed-point schemes, with the identity activation also useful due to rounding nonlinearity. The authors extend these results to networks with binary weights and analyze the size of networks needed to represent fixed-point functions; they also compare fixed-point quantized networks to floating-point results, and discuss approximations of continuous functions within this framework. Central to the approach is constructing indicator functions over quantized cubes to build a universal representator, and quantifying the parameter counts and conditions under which the construction succeeds. Overall, the paper clarifies when and how quantized networks can approximate arbitrary fixed-point mappings, bridging theory with practical low-precision neural-net implementations.
Abstract
Existing works on the expressive power of neural networks typically assume real parameters and exact operations. In this work, we study the expressive power of quantized networks under discrete fixed-point parameters and inexact fixed-point operations with round-off errors. We first provide a necessary condition and a sufficient condition on fixed-point arithmetic and activation functions for quantized networks to represent all fixed-point functions from fixed-point vectors to fixed-point numbers. Then, we show that various popular activation functions satisfy our sufficient condition, e.g., Sigmoid, ReLU, ELU, SoftPlus, SiLU, Mish, and GELU. In other words, networks using those activation functions are capable of representing all fixed-point functions. We further show that our necessary condition and sufficient condition coincide under a mild condition on activation functions: e.g., for an activation function $σ$, there exists a fixed-point number $x$ such that $σ(x)=0$. Namely, we find a necessary and sufficient condition for a large class of activation functions. We lastly show that even quantized networks using binary weights in $\{-1,1\}$ can also represent all fixed-point functions for practical activation functions.
