Table of Contents
Fetching ...

Transformers for Program Termination

Yoav Alon, Cristina David

Abstract

Determining whether a program terminates is a core challenge in program analysis with direct implications for correctness, verification, and security. We investigate whether transformer architectures can recognise termination patterns directly from source code and how their strengths can be amplified through ensembles. To overcome the extreme scarcity of non-terminating examples, we design an ensemble framework of compact transformer encoders, systematically trained with a suite of imbalance-aware loss functions and class-aware sampling techniques. By combining models trained with distinct loss functions, our ensembles achieve substantially stronger performance than any single transformer, outperforming both powerful off-the-shelf LLMs and graph-based methods. Finally, we introduce an attribution pipeline that produces syntax-aware explanations for the termination estimation.

Transformers for Program Termination

Abstract

Determining whether a program terminates is a core challenge in program analysis with direct implications for correctness, verification, and security. We investigate whether transformer architectures can recognise termination patterns directly from source code and how their strengths can be amplified through ensembles. To overcome the extreme scarcity of non-terminating examples, we design an ensemble framework of compact transformer encoders, systematically trained with a suite of imbalance-aware loss functions and class-aware sampling techniques. By combining models trained with distinct loss functions, our ensembles achieve substantially stronger performance than any single transformer, outperforming both powerful off-the-shelf LLMs and graph-based methods. Finally, we introduce an attribution pipeline that produces syntax-aware explanations for the termination estimation.

Paper Structure

This paper contains 41 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Source code is evaluated by a multi-transformer ensemble using dynamic aggregation to yield a final binary prediction. In parallel, token-level Shapley values are computed and mapped to the source spans of an Abstract Syntax Tree (AST). The resulting alignment produces a structurally attributed AST, localizing the ensemble's predictive attention to specific syntactic nodes.
  • Figure 2: Example attribution graph produced by our pipeline. Larger nodes denote stronger influence; thicker edges emphasize structurally important subtrees.
  • Figure 3: Performance scaling of various model architectures on the TPDB dataset relative to their parameter count (logarithmic scale). For the base transformer models, the mAP reflects the average performance across the BCE-effnum, Focal, and LDAM loss functions to provide a robust baseline.
  • Figure 4: 125M HumanEval example illustrating how attribution isolates the source of non-termination. The original program (right) terminates, while a single modification to the loop condition (left) produces non-termination. The modified loop node and its parent receive the strongest red attribution, clearly identifying the structural cause of the behavioral change.