Challenge design roadmap
Hugo Jair Escalante Balderas, Isabelle Guyon, Addison Howard, Walter Reade, Sebastien Treguer
TL;DR
Challenge design roadmap provides structured guidance for designing AI competitions and benchmarks. It articulates a four-part focus: pre-start considerations, a formal proposal/template, an illustrative example of a successful proposal, and practical conclusions on execution. The work emphasizes data freshness, rigorous metric design, fairness, and risk management (including data leakage and licensing) to ensure scientific rigor. It also promotes multi-phase evaluation to combat leaderboard overfitting and advocates starting kits, baselines, and templates to enhance reproducibility and inclusivity across participant skill levels.
Abstract
Challenges can be seen as a type of game that motivates participants to solve serious tasks. As a result, competition organizers must develop effective game rules. However, these rules have multiple objectives beyond making the game enjoyable for participants. These objectives may include solving real-world problems, advancing scientific or technical areas, making scientific discoveries, and educating the public. In many ways, creating a challenge is similar to launching a product. It requires the same level of excitement and rigorous testing, and the goal is to attract ''customers'' in the form of participants. The process begins with a solid plan, such as a competition proposal that will eventually be submitted to an international conference and subjected to peer review. Although peer review does not guarantee quality, it does force organizers to consider the impact of their challenge, identify potential oversights, and generally improve its quality. This chapter provides guidelines for creating a strong plan for a challenge. The material draws on the preparation guidelines from organizations such as Kaggle 1 , ChaLearn 2 and Tailor 3 , as well as the NeurIPS proposal template, which some of the authors contributed to.
