SELF: Self-Extend the Context Length With Logistic Growth Function
Phat Thanh Dang, Saahil Thoppay, Wang Yang, Qifan Wang, Vipin Chaudhary, Xiaotian Han
TL;DR
This work tackles the deterioration of long-context reasoning in LLMs caused by relative-position encoding limitations by proposing SELF, a dynamic token-grouping scheme guided by a logistic growth function. SELF blends neighbor attention for nearby tokens with gradually expanding group sizes for distant tokens, enabling longer effective context without retraining. The authors provide a concrete formulation for the logistic grouping, an efficient parallel implementation, and empirical results showing perplexity and long-context task performance improvements across multiple models and benchmarks, notably Llama-2-7B and Qwen-7B on LEval and LongBench. While benefits are substantial in many settings, some models exhibit variability, highlighting the importance of model-specific behavior and computational trade-offs in applying SELF. Overall, SELF offers a practical pathway to extend context lengths while preserving short-context performance, with direct implications for scalability of long-context reasoning in real-world applications.
Abstract
Large language models suffer issues when operated on long contexts that are larger than their training context length due to the standard position encoding for tokens in the attention layer. Tokens a long distance apart will rarely have an effect on each other and long prompts yield unexpected results. To solve this problem, we propose SELF (Self-Extend the Context Length With Logistic Growth Function): a solution of grouping consecutive tokens at varying group sizes using a logistic capacity equation combined with a constant group size at smaller relative distances. Our model had an increase in performance of up to 12% compared to the LongLM extension method in LEval (specifically on the Qwen model). On summarization related tasks in LongBench, our model performed up to 6.4% better than LongLM (specifically on the Llama-2-7b model). On reading comprehension tasks from LEval, our model performed up to 5.4% better than the LongLM. Our code is available at https://github.com/alexeipc/SELF-LLM.
