Table of Contents
Fetching ...

GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements

Mrigank Pawagi, Viraj Kumar

TL;DR

The paper addresses ambiguity in function-purpose statements by introducing GuardRails, a heuristic that leverages LLMs to surface inputs exposing ambiguities. It combines LLM-generated multiple implementations with mutation testing, property-based testing via Hypothesis, and partial doctests to produce explicit ambiguous inputs for clarifying specifications, and contrasts its performance with GitHub Copilot Chat using an open dataset of 15 functions. GuardRails is released as an open-source VSCode extension for Python and includes an evaluation showing competitive or superior detection of ambiguous inputs, particularly as problem detail increases. The work has practical impact for CS education and novice programmers and suggests integration opportunities with professional developer tooling to improve specification clarity and code quality.

Abstract

Before implementing a function, programmers are encouraged to write a purpose statement i.e., a short, natural-language explanation of what the function computes. A purpose statement may be ambiguous i.e., it may fail to specify the intended behaviour when two or more inequivalent computations are plausible on certain inputs. Our paper makes four contributions. First, we propose a novel heuristic that suggests such inputs using Large Language Models (LLMs). Using these suggestions, the programmer may choose to clarify the purpose statement (e.g., by providing a functional example that specifies the intended behaviour on such an input). Second, to assess the quality of inputs suggested by our heuristic, and to facilitate future research, we create an open dataset of purpose statements with known ambiguities. Third, we compare our heuristic against GitHub Copilot's Chat feature, which can suggest similar inputs when prompted to generate unit tests. Fourth, we provide an open-source implementation of our heuristic as an extension to Visual Studio Code for the Python programming language, where purpose statements and functional examples are specified as docstrings and doctests respectively. We believe that this tool will be particularly helpful to novice programmers and instructors.

GuardRails: Automated Suggestions for Clarifying Ambiguous Purpose Statements

TL;DR

The paper addresses ambiguity in function-purpose statements by introducing GuardRails, a heuristic that leverages LLMs to surface inputs exposing ambiguities. It combines LLM-generated multiple implementations with mutation testing, property-based testing via Hypothesis, and partial doctests to produce explicit ambiguous inputs for clarifying specifications, and contrasts its performance with GitHub Copilot Chat using an open dataset of 15 functions. GuardRails is released as an open-source VSCode extension for Python and includes an evaluation showing competitive or superior detection of ambiguous inputs, particularly as problem detail increases. The work has practical impact for CS education and novice programmers and suggests integration opportunities with professional developer tooling to improve specification clarity and code quality.

Abstract

Before implementing a function, programmers are encouraged to write a purpose statement i.e., a short, natural-language explanation of what the function computes. A purpose statement may be ambiguous i.e., it may fail to specify the intended behaviour when two or more inequivalent computations are plausible on certain inputs. Our paper makes four contributions. First, we propose a novel heuristic that suggests such inputs using Large Language Models (LLMs). Using these suggestions, the programmer may choose to clarify the purpose statement (e.g., by providing a functional example that specifies the intended behaviour on such an input). Second, to assess the quality of inputs suggested by our heuristic, and to facilitate future research, we create an open dataset of purpose statements with known ambiguities. Third, we compare our heuristic against GitHub Copilot's Chat feature, which can suggest similar inputs when prompted to generate unit tests. Fourth, we provide an open-source implementation of our heuristic as an extension to Visual Studio Code for the Python programming language, where purpose statements and functional examples are specified as docstrings and doctests respectively. We believe that this tool will be particularly helpful to novice programmers and instructors.
Paper Structure (13 sections, 4 figures)

This paper contains 13 sections, 4 figures.

Figures (4)

  • Figure 1: When GitHub Copilot Chat is prompted to generate unit tests, it suggests examples from only one of the two Ambiguous Input Classes (AICs) for this function. For each of these examples (highlighted), Copilot Chat assumes that the return value is None.
  • Figure 2: An illustration of our heuristic and implementation for the first_nonzero() function.
  • Figure 3: Differences in percentages of AIC (for each variant of all 15-questions) caught by GuardRails and GitHub Copilot Chat (top@5).
  • Figure 4: The percentage of AICs (averaged over all 15 questions) found by GitHub Copilot Chat vs. GuardRails (top@5).