Table of Contents
Fetching ...

Enumerating the k-fold configurations in multi-class classification problems

Attila Fazekas, Gyorgy Kovacs

TL;DR

The paper tackles reproducibility in $k$-fold cross-validation for multi-class classification by enumerating all standardized fold configurations consistent with the observed class distribution. It generalizes a prior binary-case enumeration method through a recursive, generator-based approach that decomposes the problem into 2-class subproblems (Partition22, Partition2M, PartitionKM) and enforces lexicographic ordering to avoid duplicates. This enables exact consistency tests of reported CV scores and provides a quantitative view of the CV configuration space, demonstrated on small datasets (for example, $N=90$, $m=3$, $k=5$ yielding $2846$ configurations). The work highlights scalability limits and points to future directions, including sampling strategies and asymptotic estimates, to inform the choice of $k$ in practical, imbalanced, small-sample settings ($k$ in the range $5$–$10$).

Abstract

K-fold cross-validation is a widely used tool for assessing classifier performance. The reproducibility crisis faced by artificial intelligence partly results from the irreproducibility of reported k-fold cross-validation-based performance scores. Recently, we introduced numerical techniques to test the consistency of claimed performance scores and experimental setups. In a crucial use case, the method relies on the combinatorial enumeration of all k-fold configurations, for which we proposed an algorithm in the binary classification case.

Enumerating the k-fold configurations in multi-class classification problems

TL;DR

The paper tackles reproducibility in -fold cross-validation for multi-class classification by enumerating all standardized fold configurations consistent with the observed class distribution. It generalizes a prior binary-case enumeration method through a recursive, generator-based approach that decomposes the problem into 2-class subproblems (Partition22, Partition2M, PartitionKM) and enforces lexicographic ordering to avoid duplicates. This enables exact consistency tests of reported CV scores and provides a quantitative view of the CV configuration space, demonstrated on small datasets (for example, , , yielding configurations). The work highlights scalability limits and points to future directions, including sampling strategies and asymptotic estimates, to inform the choice of in practical, imbalanced, small-sample settings ( in the range ).

Abstract

K-fold cross-validation is a widely used tool for assessing classifier performance. The reproducibility crisis faced by artificial intelligence partly results from the irreproducibility of reported k-fold cross-validation-based performance scores. Recently, we introduced numerical techniques to test the consistency of claimed performance scores and experimental setups. In a crucial use case, the method relies on the combinatorial enumeration of all k-fold configurations, for which we proposed an algorithm in the binary classification case.
Paper Structure (5 sections, 1 figure, 1 algorithm)

This paper contains 5 sections, 1 figure, 1 algorithm.

Figures (1)

  • Figure 1: Experimental results on the number of standardized fold configurations for a problem with 100 records and various class label distributions. As observed in subfigure (a), there is a steep decrease in the number of configurations as the cardinality of the smallest class is reached by the number of folds. Another interesting property of the number of configurations is that irregularities might appear based on the divisibility of class cardinalities by the number of folds - as observable at the configuration (20, 54, 26) with 4 folds in subfigure (b).