Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers
Atsushi Shimizu, Shohei Taniguchi, Yutaka Matsuo
TL;DR
This paper tackles the challenge of length generalization in Transformers by introducing Random Float Sampling (RFS), a position indexing method that samples continuous indices from a shared range during both training and inference. By replacing fixed discrete position indices with randomly drawn continuous ones, RFS reduces out-of-distribution issues when handling unseen input lengths and can be plugged into existing position encodings such as absolute sinusoidal, RoPE, and ALiBi. Empirical results demonstrate strong improvements on length generalization tasks and competitive zero-shot commonsense reasoning performance, with notable gains over traditional methods like simple extension or random integer sampling. The findings suggest that exposing the model to a diverse set of position distances during training enhances its ability to reason over longer contexts, offering a practical and deployment-friendly approach for robustness in language modeling and sequence tasks.
Abstract
Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks.
