Table of Contents
Fetching ...

MRCpy: A Library for Minimax Risk Classifiers

Kartheek Bondugula, Verónica Álvarez, José I. Segovia-Martín, Aritz Pérez, Santiago Mazuelas

TL;DR

MRCpy delivers a Python-based, scikit-learn–friendly implementation of minimax risk classifiers built on robust risk minimization, enabling worst-case performance guarantees under distribution shifts. The framework defines uncertainty sets via moment constraints on a feature map $\boldsymbol{\Phi}$ and supports $0$-$1$ and log losses, with convex optimization and $L_1$ regularization driving sparse, high-dimensional solutions. It extends to concept drift via AMRC and general covariate shift via DW-GCS, offering efficient solvers (subgradient, CG, SGD/Adam, constraint generation) and a modular, extensible architecture. Empirical results demonstrate faster hyper-parameter tuning using upper-bound bounds, competitive accuracy on high-dimensional biological data, and effective adaptation to drift and covariate shifts, underscoring practical impact for robust, distribution-aware classification in complex domains.

Abstract

Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The library offers multiple variants of MRCs that can provide performance guarantees, enable efficient learning in high dimensions, and adapt to distribution shifts. MRCpy follows an object-oriented approach and adheres to the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries. The source code is available under the GPL-3.0 license at https://github.com/MachineLearningBCAM/MRCpy.

MRCpy: A Library for Minimax Risk Classifiers

TL;DR

MRCpy delivers a Python-based, scikit-learn–friendly implementation of minimax risk classifiers built on robust risk minimization, enabling worst-case performance guarantees under distribution shifts. The framework defines uncertainty sets via moment constraints on a feature map and supports - and log losses, with convex optimization and regularization driving sparse, high-dimensional solutions. It extends to concept drift via AMRC and general covariate shift via DW-GCS, offering efficient solvers (subgradient, CG, SGD/Adam, constraint generation) and a modular, extensible architecture. Empirical results demonstrate faster hyper-parameter tuning using upper-bound bounds, competitive accuracy on high-dimensional biological data, and effective adaptation to drift and covariate shifts, underscoring practical impact for robust, distribution-aware classification in complex domains.

Abstract

Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The library offers multiple variants of MRCs that can provide performance guarantees, enable efficient learning in high dimensions, and adapt to distribution shifts. MRCpy follows an object-oriented approach and adheres to the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries. The source code is available under the GPL-3.0 license at https://github.com/MachineLearningBCAM/MRCpy.

Paper Structure

This paper contains 30 sections, 9 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The figure shows the functionalities of MRCpy for standard classification problems, in comparison with scikit-learn, together with the additional functionalities provided by MRCpy.
  • Figure 2: Training times using different solvers for "prostate" data set
  • Figure 3: Results on "Usenet2" data set shows the evolution of accumulated mistake bounds and accumulated mistakes per number of steps.
  • Figure 4: Results on 4 different binary classification tasks using "20 Newsgroups" data set shows that the DW-GCS method implemented in the library can adapt to general covariate shift.