The informativeness of the gradient revisited
Rustem Takhanov
TL;DR
The paper investigates the limits of gradient-based learning when targets come from almost pairwise independent function classes. It introduces an Integral Probability Metric-based measure of almost pairwise independence and proves a general bound on the gradient variance Var_h[∇C_h(w)] = Ō̃(ε + e^{-rac{1}{2}\mathcal{E}_c}) that intertwines target independence, input collision entropy, and model regularity. Applying the bound to Learning with Errors (LWE) and high-frequency functions reveals that uniform input distributions render gradient-based attacks ineffective due to exponentially small variance, while non-uniform inputs with low collision entropy can render such attacks more feasible; this is corroborated by empirical analysis of sparse secret LWE variants. The work also analyzes high-frequency targets, showing the informativeness of gradients decays with the frequency parameter R, yielding barren plateaus unless inputs are tuned to boost informative signals. Overall, the framework provides both theoretical limits and practical guidance for evaluating cryptographic primitives against gradient-based techniques and highlights open questions about constructing favorable input distributions from less informative samples.
Abstract
In the past decade gradient-based deep learning has revolutionized several applications. However, this rapid advancement has highlighted the need for a deeper theoretical understanding of its limitations. Research has shown that, in many practical learning tasks, the information contained in the gradient is so minimal that gradient-based methods require an exceedingly large number of iterations to achieve success. The informativeness of the gradient is typically measured by its variance with respect to the random selection of a target function from a hypothesis class. We use this framework and give a general bound on the variance in terms of a parameter related to the pairwise independence of the target function class and the collision entropy of the input distribution. Our bound scales as $ \tilde{\mathcal{O}}(\varepsilon+e^{-\frac{1}{2}\mathcal{E}_c}) $, where $ \tilde{\mathcal{O}} $ hides factors related to the regularity of the learning model and the loss function, $ \varepsilon $ measures the pairwise independence of the target function class and $\mathcal{E}_c$ is the collision entropy of the input distribution. To demonstrate the practical utility of our bound, we apply it to the class of Learning with Errors (LWE) mappings and high-frequency functions. In addition to the theoretical analysis, we present experiments to understand better the nature of recent deep learning-based attacks on LWE.
