Enumerating the k-fold configurations in multi-class classification problems
Attila Fazekas, Gyorgy Kovacs
TL;DR
The paper tackles reproducibility in $k$-fold cross-validation for multi-class classification by enumerating all standardized fold configurations consistent with the observed class distribution. It generalizes a prior binary-case enumeration method through a recursive, generator-based approach that decomposes the problem into 2-class subproblems (Partition22, Partition2M, PartitionKM) and enforces lexicographic ordering to avoid duplicates. This enables exact consistency tests of reported CV scores and provides a quantitative view of the CV configuration space, demonstrated on small datasets (for example, $N=90$, $m=3$, $k=5$ yielding $2846$ configurations). The work highlights scalability limits and points to future directions, including sampling strategies and asymptotic estimates, to inform the choice of $k$ in practical, imbalanced, small-sample settings ($k$ in the range $5$–$10$).
Abstract
K-fold cross-validation is a widely used tool for assessing classifier performance. The reproducibility crisis faced by artificial intelligence partly results from the irreproducibility of reported k-fold cross-validation-based performance scores. Recently, we introduced numerical techniques to test the consistency of claimed performance scores and experimental setups. In a crucial use case, the method relies on the combinatorial enumeration of all k-fold configurations, for which we proposed an algorithm in the binary classification case.
