Table of Contents
Fetching ...

Learning Task Decomposition to Assist Humans in Competitive Programming

Jiaxin Wen, Ruiqi Zhong, Pei Ke, Zhihong Shao, Hongning Wang, Minlie Huang

TL;DR

The paper tackles scalable oversight by introducing AssistV, a measure of how feasible and fast humans can repair decomposed LM solutions. It presents a three-stage, human-informed framework (critique, refine, rank) to generate high-AssistV decompositions and validates the approach in competitive programming, where non-experts become competitive with experts through decomposition-enabled supervision. Key findings show significant gains in repair speed and solution quality for humans, improved AI self- and weak-to-strong supervision, and effective knowledge transfer via distillation. The work demonstrates that learning task decomposition from human repair experiences can substantially enhance both human and AI oversight in complex problem solving, with broad implications for scalable supervision of LMs.

Abstract

When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.

Learning Task Decomposition to Assist Humans in Competitive Programming

TL;DR

The paper tackles scalable oversight by introducing AssistV, a measure of how feasible and fast humans can repair decomposed LM solutions. It presents a three-stage, human-informed framework (critique, refine, rank) to generate high-AssistV decompositions and validates the approach in competitive programming, where non-experts become competitive with experts through decomposition-enabled supervision. Key findings show significant gains in repair speed and solution quality for humans, improved AI self- and weak-to-strong supervision, and effective knowledge transfer via distillation. The work demonstrates that learning task decomposition from human repair experiences can substantially enhance both human and AI oversight in complex problem solving, with broad implications for scalable supervision of LMs.

Abstract

When using language models (LMs) to solve complex problems, humans might struggle to understand the LM-generated solutions and repair the flawed ones. To assist humans in repairing them, we propose to automatically decompose complex solutions into multiple simpler pieces that correspond to specific subtasks. We introduce a novel objective for learning task decomposition, termed assistive value (AssistV), which measures the feasibility and speed for humans to repair the decomposed solution. We collect a dataset of human repair experiences on different decomposed solutions. Utilizing the collected data as in-context examples, we then learn to critique, refine, and rank decomposed solutions to improve AssistV. We validate our method under competitive programming problems: under 177 hours of human study, our method enables non-experts to solve 33.3\% more problems, speeds them up by 3.3x, and empowers them to match unassisted experts.
Paper Structure (67 sections, 2 equations, 18 figures, 10 tables)

This paper contains 67 sections, 2 equations, 18 figures, 10 tables.

Figures (18)

  • Figure 1: Decompositions can assist humans in supervising models to solve complex problems. Left: To solve a problem, an LM would first propose an initial solution; our goal is to decompose the initial solution into multiple simpler pieces such that humans can repair it more easily. (Sub)Task descriptions are truncated for brevity. Right: The assistive value (AssistV) of a decomposition measures the feasibility and speed of humans to repair the decomposed solution in the actual problem-solving process. For example, Decomposition B has a higher AssistV value than A in practice, as it further decomposes the complex if-statement into two simpler subtasks, which effectively assists humans in identifying a missing condition.
  • Figure 2: Method overview. Left: we sample multiple decompositions from LMs and evaluate them based on assistive value $\eta$ and critique $C$. We then construct pair-wise decompositions to demonstrate the difference between low- and high-AssistV decompositions. Right: Starting from a vanilla decomposition generated by naively prompting LMs, we use the collected pair-wise data as in-context demonstrations to learn three models to critique, refine, and rank decompositions to better assist humans.
  • Figure 3: Humans provide higher-quality labels with decomposition. Dark color denotes strict accuracy of human-repaired programs; light color denotes test case average accuracy.
  • Figure 4: Decomposition improves human efficiency. We plot the relationship between the human-repaired program's test case average accuracy and human time cost.
  • Figure 5: Decomposition brings more benefits to human labelers on hard problems.
  • ...and 13 more figures