Table of Contents
Fetching ...

Amplifying human performance in combinatorial competitive programming

Petar Veličković, Alex Vitvitskyi, Larisa Markeeva, Borja Ibarz, Lars Buesing, Matej Balog, Alexander Novikov

TL;DR

The paper investigates boosting human performance in combinatorial competitive programming by coupling human-designed backbones with AI-driven scoring-function evolution using FunSearch. It validates the approach on Hash Code tasks and a held-out AtCoder contest, showing that evolved scoring functions substantially elevate backbone performance and can even outperform top human teams in several rounds. The findings demonstrate that human-AI collaboration in NP-hard optimization is a practical, scalable path to achieving elite results on real contest data. The approach remains effective under limited compute (two-hour evolution) and generalizes to substantially different problem settings, highlighting its potential for broader adoption in algorithmic optimization.

Abstract

Recent years have seen a significant surge in complex AI systems for competitive programming, capable of performing at admirable levels against human competitors. While steady progress has been made, the highest percentiles still remain out of reach for these methods on standard competition platforms such as Codeforces. Here we instead focus on combinatorial competitive programming, where the target is to find as-good-as-possible solutions to otherwise computationally intractable problems, over specific given inputs. We hypothesise that this scenario offers a unique testbed for human-AI synergy, as human programmers can write a backbone of a heuristic solution, after which AI can be used to optimise the scoring function used by the heuristic. We deploy our approach on previous iterations of Hash Code, a global team programming competition inspired by NP-hard software engineering problems at Google, and we leverage FunSearch to evolve our scoring functions. Our evolved solutions significantly improve the attained scores from their baseline, successfully breaking into the top percentile on all previous Hash Code online qualification rounds, and outperforming the top human teams on several. Our method is also performant on an optimisation problem that featured in a recent held-out AtCoder contest.

Amplifying human performance in combinatorial competitive programming

TL;DR

The paper investigates boosting human performance in combinatorial competitive programming by coupling human-designed backbones with AI-driven scoring-function evolution using FunSearch. It validates the approach on Hash Code tasks and a held-out AtCoder contest, showing that evolved scoring functions substantially elevate backbone performance and can even outperform top human teams in several rounds. The findings demonstrate that human-AI collaboration in NP-hard optimization is a practical, scalable path to achieving elite results on real contest data. The approach remains effective under limited compute (two-hour evolution) and generalizes to substantially different problem settings, highlighting its potential for broader adoption in algorithmic optimization.

Abstract

Recent years have seen a significant surge in complex AI systems for competitive programming, capable of performing at admirable levels against human competitors. While steady progress has been made, the highest percentiles still remain out of reach for these methods on standard competition platforms such as Codeforces. Here we instead focus on combinatorial competitive programming, where the target is to find as-good-as-possible solutions to otherwise computationally intractable problems, over specific given inputs. We hypothesise that this scenario offers a unique testbed for human-AI synergy, as human programmers can write a backbone of a heuristic solution, after which AI can be used to optimise the scoring function used by the heuristic. We deploy our approach on previous iterations of Hash Code, a global team programming competition inspired by NP-hard software engineering problems at Google, and we leverage FunSearch to evolve our scoring functions. Our evolved solutions significantly improve the attained scores from their baseline, successfully breaking into the top percentile on all previous Hash Code online qualification rounds, and outperforming the top human teams on several. Our method is also performant on an optimisation problem that featured in a recent held-out AtCoder contest.

Paper Structure

This paper contains 19 sections, 8 figures.

Figures (8)

  • Figure 1: High-level overview of the collaborative competitor + AI approach explored in our work.
  • Figure 2: The base scoring function used within one of the backbones for the Hash Code 2022 Qualification Round (Mentorship and Teamwork). Note the split on the rate_project variable in order to enable two different choice points to be evolved within the same scoring function.
  • Figure 3: Rankings and scores of our backbone solutions with base scoring functions, and solutions evolved by FunSearch, across all eight Hash Code online qualification rounds. We plot the Hash Code fitness scores obtained by human competitor teams---normalised to the $[0, 1]$ range by dividing by the best team's score per contest---against the teams' rank in the contest. We then compute the fitness scores obtained by the backbone with base scoring function, as well as the best fitness we were able to achieve after evolving (as "$\infty$") and the fitness scores obtained after no more than two hours of evolving (as "$2$h"). We report these scores on the ranking axis, and compare them against the ranks required to qualify into the finals. Our evolved solutions are consistently ranked in the top percentile, and outperform the top-scoring human team in five iterations (2015, 2018, 2020, 2021 and 2022).
  • Figure 4: The improvement in score obtained by FunSearch on the hill-climbing dataset we used for AHC 039, over the first $30,000$ programs. Each line corresponds to a particular cell size.
  • Figure 5: The input parsing function and the greedy algorithm backbone for the 2015 Hash Code online qualification (Optimizing a Data Center). Note that the backbone is calling score_greedy---the scoring function to optimise---and get_guaranteed_capacity---the evaluation function.
  • ...and 3 more figures