Table of Contents
Fetching ...

On the Generalizability of Transformer Models to Code Completions of Different Lengths

Nathan Cooper, Rosalia Tufano, Gabriele Bavota, Denys Poshyvanyk

TL;DR

The paper investigates how encoder-decoder transformer models generalize code completion to input lengths unseen during training, comparing Sinusoidal, xPOS, ALiBi, and T5 positional schemes across Java and Python. Through a controlled, large-scale empirical study with short, medium, long, and mixed-length datasets, it shows that none of the schemes generalizes reliably to unseen lengths, and performance degrades when test length differs from training length. A key finding is that training on a mix of lengths often yields robustness closer to length-specific models while mitigating training costs, though the improvement varies by language and metric. Overall, the work provides a practical guideline for training code completion models under length-variation constraints and highlights the need for architectural innovations beyond current positional-encoding approaches.

Abstract

The programming landscape is nowadays being reshaped by the advent of Large Language Models (LLMs) able to automate code-related tasks related to code implementation (e.g., code completion) and comprehension (e.g., code summarization). Such a paradigm shift comes with a number of implications related to how software will be written, maintained, and evolved. Also, these LLMs are extremely expensive to train, posing questions on their sustainability over time. Given their training cost, their ability to generalize, namely their ability to work on task instances different from those on which they have been trained, is an aspect worth being investigated. Previous work already showed that transformer models can successfully support code completion in a cross-project setting. However, it is unclear whether LLM are able to generalize to inputs having lengths not seen during training. For example, it is known that training a model on short instances allows to substantially reduce the training cost. However, the extent to which such a model would provide good performance on sequences having lengths not seen during training is not known. Many recent works in Natural Language Processing (NLP) tackled this problem in the context of decoder-only LLMs, i.e., xPOS and ALiBi. To assess if these solutions extend to encoder-decoder LLMs usually adopted in the code-related tasks, we present a large empirical study evaluating this generalization property of these and other encoding schemes proposed in the literature, namely Sinusoidal, xPOS, ALiBi, and T5. We found that none of these solutions successfully generalize to unseen lengths and that the only safe solution is to ensure the representativeness in the training set of all lengths likely to be encountered at inference time.

On the Generalizability of Transformer Models to Code Completions of Different Lengths

TL;DR

The paper investigates how encoder-decoder transformer models generalize code completion to input lengths unseen during training, comparing Sinusoidal, xPOS, ALiBi, and T5 positional schemes across Java and Python. Through a controlled, large-scale empirical study with short, medium, long, and mixed-length datasets, it shows that none of the schemes generalizes reliably to unseen lengths, and performance degrades when test length differs from training length. A key finding is that training on a mix of lengths often yields robustness closer to length-specific models while mitigating training costs, though the improvement varies by language and metric. Overall, the work provides a practical guideline for training code completion models under length-variation constraints and highlights the need for architectural innovations beyond current positional-encoding approaches.

Abstract

The programming landscape is nowadays being reshaped by the advent of Large Language Models (LLMs) able to automate code-related tasks related to code implementation (e.g., code completion) and comprehension (e.g., code summarization). Such a paradigm shift comes with a number of implications related to how software will be written, maintained, and evolved. Also, these LLMs are extremely expensive to train, posing questions on their sustainability over time. Given their training cost, their ability to generalize, namely their ability to work on task instances different from those on which they have been trained, is an aspect worth being investigated. Previous work already showed that transformer models can successfully support code completion in a cross-project setting. However, it is unclear whether LLM are able to generalize to inputs having lengths not seen during training. For example, it is known that training a model on short instances allows to substantially reduce the training cost. However, the extent to which such a model would provide good performance on sequences having lengths not seen during training is not known. Many recent works in Natural Language Processing (NLP) tackled this problem in the context of decoder-only LLMs, i.e., xPOS and ALiBi. To assess if these solutions extend to encoder-decoder LLMs usually adopted in the code-related tasks, we present a large empirical study evaluating this generalization property of these and other encoding schemes proposed in the literature, namely Sinusoidal, xPOS, ALiBi, and T5. We found that none of these solutions successfully generalize to unseen lengths and that the only safe solution is to ensure the representativeness in the training set of all lengths likely to be encountered at inference time.
Paper Structure (14 sections, 9 equations, 2 figures, 7 tables)

This paper contains 14 sections, 9 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Sequence to Sequence Transformer Overview from the original paper vaswani2017attention. The left part is the encoder and the right part is the decoder.
  • Figure 2: ALiBi Overview from the original paper press2022alibi.