The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
Yujun Kim, Chaewon Moon, Chulhee Yun
TL;DR
This work characterizes the parameter-count cost of robust memorization in ReLU networks as a function of the robustness ratio $\rho=\mu/\epsilon_D$, providing tight upper and lower bounds across the full $(0,1)$ interval. It introduces a nuanced, regime-dependent picture: when $\rho$ is small, robust memorization costs match classical memorization ($\tilde{\Theta}(\sqrt{N})$) but as $\rho$ grows, the required parameter count increases, up to $\tilde{O}(Nd^2)$ in the largest-robustness regime. The authors develop novel tools—separation-preserving dimensionality reduction (a strengthened Johnson-Lindenstrauss lemma) and a grid-lattice mapping approach—to construct compact robust memorization schemes, and they extend the analysis to general $\ell_p$ norms. The results reveal a tight coupling between robustness and network complexity and offer a concrete pathway to design efficient robust memorization schemes, including sublinear-parameter constructions in certain $\rho$ regimes. Overall, the paper advances fundamental understanding of robustness costs in neural memorization and closes substantial gaps in the $\rho$-dependent parameter scaling.
Abstract
We study the parameter complexity of robust memorization for $\mathrm{ReLU}$ networks: the number of parameters required to interpolate any given dataset with $ε$-separation between differently labeled points, while ensuring predictions remain consistent within a $μ$-ball around each training sample. We establish upper and lower bounds on the parameter count as a function of the robustness ratio $ρ= μ/ ε$. Unlike prior work, we provide a fine-grained analysis across the entire range $ρ\in (0,1)$ and obtain tighter upper and lower bounds that improve upon existing results. Our findings reveal that the parameter complexity of robust memorization matches that of non-robust memorization when $ρ$ is small, but grows with increasing $ρ$.
