Tight Generalization Bounds for Large-Margin Halfspaces
Kasper Green Larsen, Natascha Schalburg
TL;DR
The paper resolves the long-standing question of how tightly large-margin halfspaces generalize by deriving a bound that is asymptotically tight in the margin $\gamma$, the empirical margin loss $\mathcal{L}^{\gamma}_{\mathbf{S}}(w)$, the failure probability $\delta$, and the sample size $n$. Building on and extending the SVMbest framework, the authors introduce a refined random discretization and an infinite sequence of grids, coupled with a contraction-based Rademacher analysis, to tightly control the gap between true risk and empirical margin loss. A key technical innovation is showing that the distribution of the discretized inner products can be expressed as a function of the original margin, enabling a Lipschitz-based contraction with exponentially decaying terms in the discretization dimension $k$. By balancing these discretization errors with refined bounding techniques (including meet-in-the-middle arguments and a careful union bound over sub-tasks), the paper achieves a bound of the form $\mathcal{L}_{\mathcal{D}}(w) \le \mathcal{L}^{\gamma}_{\mathbf{S}}(w) + c \left( \sqrt{ \mathcal{L}^{\gamma}_{\mathbf{S}}(w) \left( \frac{\ln(e/\mathcal{L}^{\gamma}_{\mathbf{S}}(w))}{\gamma^2 n} + \frac{\ln(e/\delta)}{n} \right) } + \frac{\ln(e\gamma^2 n)}{\gamma^2 n} + \frac{\ln(e/\delta)}{n} \right)$, which matches the corresponding lower bound up to constants and tightens the understanding of margin-based generalization. The result has implications for theoretical guarantees of margin-based classifiers (e.g., SVMs) in high-dimensional settings and deepens the link between margin, complexity, and generalization in linear predictors. The work demonstrates that careful handling of discretization, Lipschitz constants, and concentration can close gaps between upper and lower bounds in classical models.
Abstract
We prove the first generalization bound for large-margin halfspaces that is asymptotically tight in the tradeoff between the margin, the fraction of training points with the given margin, the failure probability and the number of training points.
