Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression
Weiyi He, Yue Xing
TL;DR
This work develops a Rademacher-complexity framework to quantify how completely learnable positional encodings (PE) affect the generalization and robustness of a one-layer Transformer in in-context regression. It derives clean RC bounds showing the generalization gap grows with the PE-induced parameter space and demonstrates an adversarial extension via Adversarial Rademacher Complexity (ARC), where attacks magnify the PE-related gap. A surrogate-loss framework and a detailed analysis of attack-induced changes to the solution space reveal a Phi(eps,t,d) factor that governs adversarial effects and context-length dependence. Empirical simulations corroborate the theory, showing larger gaps under PE, with longer contexts mitigating but not eliminating vulnerability to adversarial prompts, and RoPE behaving similarly to the no-PE baseline.
Abstract
Positional encoding (PE) is a core architectural component of Transformers, yet its impact on the Transformer's generalization and robustness remains unclear. In this work, we provide the first generalization analysis for a single-layer Transformer under in-context regression that explicitly accounts for a completely trainable PE module. Our result shows that PE systematically enlarges the generalization gap. Extending to the adversarial setting, we derive the adversarial Rademacher generalization bound. We find that the gap between models with and without PE is magnified under attack, demonstrating that PE amplifies the vulnerability of models. Our bounds are empirically validated by a simulation study. Together, this work establishes a new framework for understanding the clean and adversarial generalization in ICL with PE.
