Table of Contents
Fetching ...

A Unified Framework for Rank-based Loss Minimization

Rufeng Xiao, Yuze Ge, Rujun Jiang, Yifan Yan

TL;DR

A unified framework for the optimization of the rank-based loss through the utilization of a proximal alternating direction method of multipliers is introduced and the convergence and convergence rate of the proposed algorithm under mild conditions are demonstrated.

Abstract

The empirical loss, commonly referred to as the average loss, is extensively utilized for training machine learning models. However, in order to address the diverse performance requirements of machine learning models, the use of the rank-based loss is prevalent, replacing the empirical loss in many cases. The rank-based loss comprises a weighted sum of sorted individual losses, encompassing both convex losses like the spectral risk, which includes the empirical risk and conditional value-at-risk, and nonconvex losses such as the human-aligned risk and the sum of the ranked range loss. In this paper, we introduce a unified framework for the optimization of the rank-based loss through the utilization of a proximal alternating direction method of multipliers. We demonstrate the convergence and convergence rate of the proposed algorithm under mild conditions. Experiments conducted on synthetic and real datasets illustrate the effectiveness and efficiency of the proposed algorithm.

A Unified Framework for Rank-based Loss Minimization

TL;DR

A unified framework for the optimization of the rank-based loss through the utilization of a proximal alternating direction method of multipliers is introduced and the convergence and convergence rate of the proposed algorithm under mild conditions are demonstrated.

Abstract

The empirical loss, commonly referred to as the average loss, is extensively utilized for training machine learning models. However, in order to address the diverse performance requirements of machine learning models, the use of the rank-based loss is prevalent, replacing the empirical loss in many cases. The rank-based loss comprises a weighted sum of sorted individual losses, encompassing both convex losses like the spectral risk, which includes the empirical risk and conditional value-at-risk, and nonconvex losses such as the human-aligned risk and the sum of the ranked range loss. In this paper, we introduce a unified framework for the optimization of the rank-based loss through the utilization of a proximal alternating direction method of multipliers. We demonstrate the convergence and convergence rate of the proposed algorithm under mild conditions. Experiments conducted on synthetic and real datasets illustrate the effectiveness and efficiency of the proposed algorithm.
Paper Structure (35 sections, 10 theorems, 67 equations, 3 figures, 15 tables, 2 algorithms)

This paper contains 35 sections, 10 theorems, 67 equations, 3 figures, 15 tables, 2 algorithms.

Key Result

Proposition 1

For constant $\sigma_i$, suppose $v_{[m,n]} > v_{[n+1,p]}$, the blocks $[m,n]$ and $[n+1,p]$ are consecutive out-of-order blocks. We merge these two blocks into $[m,p]$. Then the block optimal value, denoted by $v_{[m,p]}$, satisfies $v_{[n+1,p]} \leq v_{[m,p]} \leq v_{[m,n]}.$

Figures (3)

  • Figure 1: Time vs. Sub-optimality gap in synthetic datasets with ERM framework. (a-d) for $\ell_2$ regularization, and (e-f) for $\ell_1$ regularization. Sub-optimality is defined as $F^k-F^*$, where $F^k$ represents the objective function value at the $k$-th iteration or epoch and $F^*$ denotes the minimum value obtained by all algorithms. Plots are truncated when $F^k-F^*<10^{-8}$.
  • Figure 2: Time vs. Sub-optimality gap in synthetic datasets with superquantile framework. (a-d) for $\ell_2$ regularization, and (e-f) for $\ell_1$ regularization. Sub-optimality is defined as $F^k-F^*$, where $F^k$ represents the objective function value at the $k$-th iteration or epoch and $F^*$ denotes the minimum value obtained by all algorithms. Plots are truncated when $F^k-F^*<10^{-8}$.
  • Figure 3: Time vs. Sub-optimality gap in synthetic dataset with ERM framework and $\ell_1$ regularization. The datasets with the same number of samples are generated by different random number seeds. Sub-optimality is defined as $F^k-F^*$, where $F^k$ represents the objective function value at the $k$-th iteration or epoch and $F^*$ denotes the minimum value obtained by all algorithms. Plots are truncated when $F^k-F^*<10^{-8}$.

Theorems & Definitions (10)

  • Proposition 1
  • Theorem 1
  • Proposition 2
  • Lemma 1
  • Theorem 2
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Proposition 3