Table of Contents
Fetching ...

Probeable Problems for Beginner-level Programming-with-AI Contests

Mrigank Pawagi, Viraj Kumar

TL;DR

Probeable Problems address the challenge of beginner level programming tasks in AI-enabled contest environments by deliberately omitting specification details and providing an automated oracle for clarifications. The authors design and deploy a 2-hour contest with six probeable problems and study how participants and AI code generators respond to incomplete specifications. They find that contestants deduce only a minority of missing details (about 20%), even with verifiers that reveal partial feedback; some teams leverage verifiers to correct errors, but overall improvement is limited under time pressure. The work suggests that structured ambiguity combined with automated feedback can foster learning of requirements elicitation and testing, with potential to extend to more complex CS tasks and prompts for future AI-assisted programming education.

Abstract

To broaden participation, competitive programming contests may include beginner-level problems that do not require knowledge of advanced Computer Science concepts (e.g., algorithms and data structures). However, since most participants have easy access to AI code-generation tools, these problems often become trivial to solve. For beginner-friendly programming contests that do not prohibit the use of AI tools, we propose Probeable Problems: code writing tasks that provide (1) a problem specification that deliberately omits certain details, and (2) a mechanism to probe for these details by asking clarifying questions and receiving immediate feedback. To evaluate our proposal, we conducted a 2-hour programming contest for undergraduate Computer Science students from multiple institutions, where each student was an active member of their institution's computing club. The contest comprised of six Probeable Problems for which a popular code-generation tool (GitHub Copilot) was unable to generate accurate solutions due to the absence of details. Students were permitted to work individually or in groups, and were free to use AI tools. We obtained consent from 26 groups (67 students) to use their submissions for research. We analyze the extent to which the code submitted by these groups identifies missing details and identify ways in which Probeable Problems can support learning in formal and informal CS educational contexts.

Probeable Problems for Beginner-level Programming-with-AI Contests

TL;DR

Probeable Problems address the challenge of beginner level programming tasks in AI-enabled contest environments by deliberately omitting specification details and providing an automated oracle for clarifications. The authors design and deploy a 2-hour contest with six probeable problems and study how participants and AI code generators respond to incomplete specifications. They find that contestants deduce only a minority of missing details (about 20%), even with verifiers that reveal partial feedback; some teams leverage verifiers to correct errors, but overall improvement is limited under time pressure. The work suggests that structured ambiguity combined with automated feedback can foster learning of requirements elicitation and testing, with potential to extend to more complex CS tasks and prompts for future AI-assisted programming education.

Abstract

To broaden participation, competitive programming contests may include beginner-level problems that do not require knowledge of advanced Computer Science concepts (e.g., algorithms and data structures). However, since most participants have easy access to AI code-generation tools, these problems often become trivial to solve. For beginner-friendly programming contests that do not prohibit the use of AI tools, we propose Probeable Problems: code writing tasks that provide (1) a problem specification that deliberately omits certain details, and (2) a mechanism to probe for these details by asking clarifying questions and receiving immediate feedback. To evaluate our proposal, we conducted a 2-hour programming contest for undergraduate Computer Science students from multiple institutions, where each student was an active member of their institution's computing club. The contest comprised of six Probeable Problems for which a popular code-generation tool (GitHub Copilot) was unable to generate accurate solutions due to the absence of details. Students were permitted to work individually or in groups, and were free to use AI tools. We obtained consent from 26 groups (67 students) to use their submissions for research. We analyze the extent to which the code submitted by these groups identifies missing details and identify ways in which Probeable Problems can support learning in formal and informal CS educational contexts.
Paper Structure (26 sections, 9 figures)

This paper contains 26 sections, 9 figures.

Figures (9)

  • Figure 1: (Left, full specification) A screenshot showing a correct solution suggested by GitHub Copilot (grey font, after Line 7). (Right, after trimming away details) GitHub Copilot suggests a functionally inequivalent solution (after Line 3), since it is forced to make "reasonable" decisions about the omitted details.
  • Figure 2: (Left) A Probeable Problem with omitted specification details, and a contestant's clarifying question attempting to resolve one such detail. (Right) The automated feedback generated by an 'oracle' specifies the desired output on this input.
  • Figure 3: Practice Problem. The given task specification deliberately omits the definition of "palindrome" (highlighted in grey). The omitted definition is non-standard (shown in the grey box). CodeCheck: https://codecheck.io/files/23122107005thqqy15kp8cl3u9pdekwrs01, https://codecheck.io/files/23122107523v0rdcv2x6bpktj02n5ke7ovw.
  • Figure 4: Problem 1. The task specification refers to "the index" (highlighted in cyan) without clarifying how to break ties, if any. The tie-break mechanism "largest" (shown in the cyan box) is deliberately omitted. The value to be returned when the list has no positive integers (shown in the red box) is also deliberately omitted. CodeCheck: https://codecheck.io/files/2306111033cnnmzafkxveg0ap6i7blj01f0, https://codecheck.io/files/2312210759einuagsjv9d9v447safopvrs7
  • Figure 5: Problem 2. The task specification deliberately omits the return type (shown in the blue box) and a definition for the grey-highlighted term "first positive integer". This definition is non-trivial, and is shown in the grey box. The value to be returned when the string has no such positive integer (shown in the red box) is also deliberately omitted. CodeCheck: https://codecheck.io/files/2306111051595nfjvjxiu7a73md5cn4saj9, https://codecheck.io/files/23122108113rh4yb9fq33jmta2vlozb9nxe
  • ...and 4 more figures