Table of Contents
Fetching ...

Guiding Language Models of Code with Global Context using Monitors

Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu K. Lahiri, Sriram K. Rajamani

TL;DR

This work addresses the gap where language models struggle with global repository context when generating code. By introducing monitor-guided decoding (MGD), which couples a frozen LM with static analysis as a monitor, the approach enforces type-safety and API protocol constraints at decode-time without retraining. Through the PragmaticCode dataset and DotPrompts, the authors demonstrate that MGD improves compilation rates and ground-truth alignment across model scales, with small models achieving competitive performance against larger LMs. The method is generalized via MGDMicroBench to multiple languages and semantic analyses, highlighting practical impact for IDE-assisted code generation and potential privacy and cost benefits by enabling smaller models to perform effectively.

Abstract

Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model. We also conduct a generalizability study to evaluate the ability of MGD to generalize to multiple programming languages (Java, C# and Rust), coding scenarios (e.g., correct number of arguments to method calls), and to enforce richer semantic constraints (e.g., stateful API protocols). Our data and implementation are available at https://github.com/microsoft/monitors4codegen .

Guiding Language Models of Code with Global Context using Monitors

TL;DR

This work addresses the gap where language models struggle with global repository context when generating code. By introducing monitor-guided decoding (MGD), which couples a frozen LM with static analysis as a monitor, the approach enforces type-safety and API protocol constraints at decode-time without retraining. Through the PragmaticCode dataset and DotPrompts, the authors demonstrate that MGD improves compilation rates and ground-truth alignment across model scales, with small models achieving competitive performance against larger LMs. The method is generalized via MGDMicroBench to multiple languages and semantic analyses, highlighting practical impact for IDE-assisted code generation and potential privacy and cost benefits by enabling smaller models to perform effectively.

Abstract

Language models of code (LMs) work well when the surrounding code provides sufficient context. This is not true when it becomes necessary to use types, functionality or APIs defined elsewhere in the repository or a linked library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating. Integrated development environments (IDEs) assist developers in understanding repository context using static analysis. We extend this assistance, enjoyed by developers, to LMs. We propose monitor-guided decoding (MGD) where a monitor uses static analysis to guide the decoding. We construct a repository-level dataset PragmaticCode for method-completion in Java and evaluate MGD on it. On models of varying parameter scale, by monitoring for type-consistent object dereferences, MGD consistently improves compilation rates and agreement with ground truth. Further, LMs with fewer parameters, when augmented with MGD, can outperform larger LMs. With MGD, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model. We also conduct a generalizability study to evaluate the ability of MGD to generalize to multiple programming languages (Java, C# and Rust), coding scenarios (e.g., correct number of arguments to method calls), and to enforce richer semantic constraints (e.g., stateful API protocols). Our data and implementation are available at https://github.com/microsoft/monitors4codegen .
Paper Structure (25 sections, 3 equations, 12 figures, 3 tables)

This paper contains 25 sections, 3 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Motivating example to illustrate monitor-guided decoding (MGD).
  • Figure 2: score@k for models with MGD and Standard prompt compared against base models. The values of $k \in [1,6]$ are marked on the X-axis.
  • Figure 3: score@k for models with MGD and prompt augmentation compared against base models.
  • Figure 4: Monitor to guide generation of type-consistent identifiers for the code in Figure \ref{['fig:motivating_example']}.
  • Figure 5: score@k for models with MGD and FIM compared against base models
  • ...and 7 more figures