Table of Contents
Fetching ...

Efficient Large-Scale Learning of Minimax Risk Classifiers

Kartheek Bondugula, Santiago Mazuelas, Aritz Pérez

TL;DR

The paper tackles learning minimax risk classifiers (MRCs) for large-scale, multi-class tasks where the objective is the worst-case risk over an uncertainty set. It introduces a constraint-and-column generation algorithm (MRC-CCG) that solves the LP formulation of MRCs by iteratively updating small subsets of constraints and features, with greedy primal/dual updates and efficient constraint evaluation. Theoretical results establish monotone convergence and bounds on the solution quality under violation thresholds, while empirical results show up to 10x speedups on large-scale data and up to 100x speedups as the number of classes grows, without compromising accuracy. This approach enables robust, scalable MRC learning for real-world, high-dimensional, multi-class problems, offering practical performance guarantees and significant computational benefits.

Abstract

Supervised learning with large-scale data usually leads to complex optimization problems, especially for classification tasks with multiple classes. Stochastic subgradient methods can enable efficient learning with a large number of samples for classification techniques that minimize the average loss over the training samples. However, recent techniques, such as minimax risk classifiers (MRCs), minimize the maximum expected loss and are not amenable to stochastic subgradient methods. In this paper, we present a learning algorithm based on the combination of constraint and column generation that enables efficient learning of MRCs with large-scale data for classification tasks with multiple classes. Experiments on multiple benchmark datasets show that the proposed algorithm provides upto a 10x speedup for general large-scale data and around a 100x speedup with a sizeable number of classes.

Efficient Large-Scale Learning of Minimax Risk Classifiers

TL;DR

The paper tackles learning minimax risk classifiers (MRCs) for large-scale, multi-class tasks where the objective is the worst-case risk over an uncertainty set. It introduces a constraint-and-column generation algorithm (MRC-CCG) that solves the LP formulation of MRCs by iteratively updating small subsets of constraints and features, with greedy primal/dual updates and efficient constraint evaluation. Theoretical results establish monotone convergence and bounds on the solution quality under violation thresholds, while empirical results show up to 10x speedups on large-scale data and up to 100x speedups as the number of classes grows, without compromising accuracy. This approach enables robust, scalable MRC learning for real-world, high-dimensional, multi-class problems, offering practical performance guarantees and significant computational benefits.

Abstract

Supervised learning with large-scale data usually leads to complex optimization problems, especially for classification tasks with multiple classes. Stochastic subgradient methods can enable efficient learning with a large number of samples for classification techniques that minimize the average loss over the training samples. However, recent techniques, such as minimax risk classifiers (MRCs), minimize the maximum expected loss and are not amenable to stochastic subgradient methods. In this paper, we present a learning algorithm based on the combination of constraint and column generation that enables efficient learning of MRCs with large-scale data for classification tasks with multiple classes. Experiments on multiple benchmark datasets show that the proposed algorithm provides upto a 10x speedup for general large-scale data and around a 100x speedup with a sizeable number of classes.

Paper Structure

This paper contains 23 sections, 2 theorems, 20 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Let $\mathrm{R}^*$ be the worst-case error probability obtained by solving eq:mrc_linear_primal using all the constraints and features. If $\mathrm{R}^{k}, k=1,2\ldots,$ is the sequence of optimal values obtained by adding and removing constraints along the iterations of the proposed algorithm. Then Moreover, if $\hat{\epsilon}_1$ is the largest violation in the constraints of the primal at iterat

Figures (3)

  • Figure 1: Convergence of the worst-case error probability $\mathrm{R}^k$ over time. The figures correspond to different scenarios of large-scale learning and demonstrate that MRC-CCG achieves a fast convergence in comparison with state-of-the-art learning methods for MRC.
  • Figure 2: Illustration of scalability for multiple scenarios of large-scale learning. The figures demonstrate that MRC-CCG achieves an improved scalability in comparison with state-of-the-art learning methods for MRC.
  • Figure 3: Effect of hyper-parameters $n_\text{max}$ and $m_\text{max}$ on training time.

Theorems & Definitions (6)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • proof
  • proof