Table of Contents
Fetching ...

Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions

Antoine Marot, David Rousseau, Zhen, Xu

TL;DR

The chapter argues that AI/ML challenges must be followed by structured post-challenge activities to realize long-term impact. It proposes a taxonomy of post-challenge outputs, a comprehensive paper template, and processes to transform challenges into enduring benchmarks (e.g., via codabench), reinforced by post-challenge workshops. Key contributions include guidance on organizing raw outputs, performing deep analyses of submissions, and communicating scientific outcomes and organizational lessons. Collectively, these recommendations aim to sustain community engagement, improve reproducibility, and enable continuous advancement beyond the initial competition.

Abstract

The conclusion of an AI challenge is not the end of its lifecycle; ensuring a long-lasting impact requires meticulous post-challenge activities. The long-lasting impact also needs to be organised. This chapter covers the various activities after the challenge is formally finished. This work identifies target audiences for post-challenge initiatives and outlines methods for collecting and organizing challenge outputs. The multiple outputs of the challenge are listed, along with the means to collect them. The central part of the chapter is a template for a typical post-challenge paper, including possible graphs and advice on how to turn the challenge into a long-lasting benchmark.

Towards impactful challenges: post-challenge paper, benchmarks and other dissemination actions

TL;DR

The chapter argues that AI/ML challenges must be followed by structured post-challenge activities to realize long-term impact. It proposes a taxonomy of post-challenge outputs, a comprehensive paper template, and processes to transform challenges into enduring benchmarks (e.g., via codabench), reinforced by post-challenge workshops. Key contributions include guidance on organizing raw outputs, performing deep analyses of submissions, and communicating scientific outcomes and organizational lessons. Collectively, these recommendations aim to sustain community engagement, improve reproducibility, and enable continuous advancement beyond the initial competition.

Abstract

The conclusion of an AI challenge is not the end of its lifecycle; ensuring a long-lasting impact requires meticulous post-challenge activities. The long-lasting impact also needs to be organised. This chapter covers the various activities after the challenge is formally finished. This work identifies target audiences for post-challenge initiatives and outlines methods for collecting and organizing challenge outputs. The multiple outputs of the challenge are listed, along with the means to collect them. The central part of the chapter is a template for a typical post-challenge paper, including possible graphs and advice on how to turn the challenge into a long-lasting benchmark.
Paper Structure (22 sections, 9 figures, 1 table)

This paper contains 22 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Participants' best score evolution as a function of days in the competition TrackMLAccuracy2019.
  • Figure 2: Participants score evolution: the horizontal axis is the accuracy, and the vertical axis is the inference speed. The total score, a function of both variables, is displayed in grey contours. Each colour/marker type corresponds to a contributor; the lines help to follow the score evolutionTrackMLThroughput2021.
  • Figure 3: A high-level retrospective competition period timeline can help understand the competition organisation through phases and noteworthy events such as competition adjustments, peak of activities, performance outbreaks, and collaborative periods. This can support the competition narrative. marot2021learning
  • Figure 4: Overfitting/Ranking stability plot from autoseries. The comparison of the x/y axes shows overfitting. In a challenge of two phases, the feedback (or public) phase is the first phase, and the private phase is the second. By showing how submissions perform in the consequent two phases, we demonstrate the overfitting of algorithms. The rectangle shows ranking stability around each team. This rectangle is calculated by the average ranks of multiple reproductions of submissions' performances.
  • Figure 5: AMS Significance, a figure of merit of a classifier in a physics particle discovery context, as a function of decision threshold for different submissions in the HiggsML challenge higgsml. The submission score is the maximum of the curve. (d) shows the private curves for different participants. (a),(b) and (c) compare the public and private curves from three participants, Gabor, Choko and Lubozs, with dots indicating their maximum values. The private curves are smoother because they are evaluated using more examples.
  • ...and 4 more figures