Table of Contents
Fetching ...

Of Dice and Games: A Theory of Generalized Boosting

Marco Bressan, Nataly Brukhim, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen

TL;DR

This work develops a unified theory of boosting for generalized losses that are cost-sensitive and multi-objective. By casting boosting as a Blackwell-approachability style game, it derives a sharp threshold—captured by the game value $V(w)$—that separates boostable from trivial weak learners in binary cost-sensitive settings, and extends to a richer multiclass landscape with multiple thresholds $V_J(w)$. A key contribution is establishing an equivalence between cost-sensitive and multi-objective losses via convex combinations and duality, enabling transfer of booster guarantees between the two perspectives. The results include constructive boosting algorithms with provable sample complexity, a geometric interpretation of the loss regions, and lower bounds that characterize the limits of boostability in both binary and multiclass regimes, with extensions to list-based multiclass PAC learning. The framework offers a principled foundation for designing boosting procedures under realistic, application-driven loss structures.

Abstract

Cost-sensitive loss functions are crucial in many real-world prediction problems, where different types of errors are penalized differently; for example, in medical diagnosis, a false negative prediction can lead to worse consequences than a false positive prediction. However, traditional PAC learning theory has mostly focused on the symmetric 0-1 loss, leaving cost-sensitive losses largely unaddressed. In this work, we extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses. Cost-sensitive losses assign costs to the entries of a confusion matrix, and are used to control the sum of prediction errors accounting for the cost of each error type. Multi-objective losses, on the other hand, simultaneously track multiple cost-sensitive losses, and are useful when the goal is to satisfy several criteria at once (e.g., minimizing false positives while keeping false negatives below a critical threshold). We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees that distinguishes which guarantees are trivial (i.e., can always be achieved), which ones are boostable (i.e., imply strong learning), and which ones are intermediate, implying non-trivial yet not arbitrarily accurate learning. For binary classification, we establish a dichotomy: a weak learning guarantee is either trivial or boostable. In the multiclass setting, we describe a more intricate landscape of intermediate weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.

Of Dice and Games: A Theory of Generalized Boosting

TL;DR

This work develops a unified theory of boosting for generalized losses that are cost-sensitive and multi-objective. By casting boosting as a Blackwell-approachability style game, it derives a sharp threshold—captured by the game value —that separates boostable from trivial weak learners in binary cost-sensitive settings, and extends to a richer multiclass landscape with multiple thresholds . A key contribution is establishing an equivalence between cost-sensitive and multi-objective losses via convex combinations and duality, enabling transfer of booster guarantees between the two perspectives. The results include constructive boosting algorithms with provable sample complexity, a geometric interpretation of the loss regions, and lower bounds that characterize the limits of boostability in both binary and multiclass regimes, with extensions to list-based multiclass PAC learning. The framework offers a principled foundation for designing boosting procedures under realistic, application-driven loss structures.

Abstract

Cost-sensitive loss functions are crucial in many real-world prediction problems, where different types of errors are penalized differently; for example, in medical diagnosis, a false negative prediction can lead to worse consequences than a false positive prediction. However, traditional PAC learning theory has mostly focused on the symmetric 0-1 loss, leaving cost-sensitive losses largely unaddressed. In this work, we extend the celebrated theory of boosting to incorporate both cost-sensitive and multi-objective losses. Cost-sensitive losses assign costs to the entries of a confusion matrix, and are used to control the sum of prediction errors accounting for the cost of each error type. Multi-objective losses, on the other hand, simultaneously track multiple cost-sensitive losses, and are useful when the goal is to satisfy several criteria at once (e.g., minimizing false positives while keeping false negatives below a critical threshold). We develop a comprehensive theory of cost-sensitive and multi-objective boosting, providing a taxonomy of weak learning guarantees that distinguishes which guarantees are trivial (i.e., can always be achieved), which ones are boostable (i.e., imply strong learning), and which ones are intermediate, implying non-trivial yet not arbitrarily accurate learning. For binary classification, we establish a dichotomy: a weak learning guarantee is either trivial or boostable. In the multiclass setting, we describe a more intricate landscape of intermediate weak learning guarantees. Our characterization relies on a geometric interpretation of boosting, revealing a surprising equivalence between cost-sensitive and multi-objective losses.

Paper Structure

This paper contains 29 sections, 29 theorems, 92 equations, 3 figures, 3 algorithms.

Key Result

Theorem A

Let ${\mathcal{Y}}=\{-1,+1\}$. Let $w = (w_+, w_-) \in (0,1]^2$ be a cost. Then, for all $z \ge 0$, exactly one of the following holds. Moreover, $(w,z)$ is boostable if and only if $z < V(w)$, where $\operatorname{V}(w) = \frac{w_+ w_-}{w_+ + w_-}$.

Figures (3)

  • Figure 1: Boostability thresholds.Binary. For classic 0-1 loss binary boosting, it is well-known that the boostability threshold is $1/2$: any value below it can be boosted, while any value above it is trivially attainable by non-boostable learners. For any cost $w$, the boostability threshold is $\operatorname{V}(w)$ (see \ref{['eq:val_of_game']}, \ref{['thm:intro_binary_boost']}). For the multi-objective loss, the threshold is determined by the boundary of the coin-attainable region, denoted $C(\boldsymbol{w})$ (see \ref{['def:coin_attainability']}, \ref{['thm:binary_MO_boost']}), as illustrated in the plot on the right; each point in the plane corresponds to false-positive and false-negative errors $(z_+,z_-)$. The two colored regions in the plot correspond to (a) coin-attainable point $C(\boldsymbol{w})$ (in red) and (b) boostable points $[0,1]^2\setminus C(\boldsymbol{w})$ (in blue). See below \ref{['thm:binary_MO_boost']} for further discussion. Multiclass. A similar pattern holds for multiclass boosting. For 0-1 loss, boostability is known to be determined by $k-1$ thresholds Brukhim23simple. For any cost $w$, the boostability thresholds are $v_{{n}}(w)$ (see \ref{['eq:multiclass_critical_thresholds']}). For the multi-objective loss, thresholds are determined by the boundaries of dice-attainable regions $D_J(\boldsymbol{w})$ (see \ref{['subsec:main_results_multiclass']} for further details).
  • Figure 2: In all plots, each point $e = (e_+,e_-)$ in the plane corresponds to false-positive and false-negative errors. (Left) Cost-sensitive vs. multi-objective. The leftmost figure corresponds to a cost-sensitive guarantee $(w,z)$, where the blue line is given by $\langle e, w\rangle = z$. The shaded area is the feasible region of points $e$ satisfying the guarantee. The second figure corresponds to a multi-objective guarantee $(\boldsymbol{w},\boldsymbol{z})$, where $r=3$ corresponds to 3 different lines, each of the form $\langle e, w_i\rangle = z_i$. The shaded area corresponds to all points satisfying all guarantees, i.e, attaining $(\boldsymbol{w},\boldsymbol{z})$. (Right) Envelope of the coin-attainable region. The rightmost figure presents many different lines; each line correspond to a guarantee $(w,\operatorname{V}(w))$ and is of the form $\langle e, w\rangle = \operatorname{V}(w)$. Then, the coin-attainable boundary curve in the case of false-positive and false-negative costs (i.e., $\boldsymbol{w} = (w_p,w_n)$), is given by $\sqrt{e_+} + \sqrt{e_-} = 1$ (as shown in \ref{['sec:binary']}). Furthermore, that boundary of the coin-attainable points is the same curve obtained by the envelope of the different lines. This ties between the value $\operatorname{V}(w)$ to the coin-attainable area $C(\boldsymbol{w})$ (see below \ref{['thm:binary_MO_boost']} for further discussion).
  • Figure 3: Duality in a picture. For a multi-objective cost $\boldsymbol{w}:{\mathcal{Y}}^2\to\mathbb{R}_{\ge 0}^2$, the set $C(\boldsymbol{w})$ of all coin-attainable vectors $\boldsymbol{z}$ is the intersection of all halfspaces in the form $H(\boldsymbol{\alpha})=\{{\boldsymbol{x}} \in \mathbb{R}^2 : \boldsymbol{\alpha} \cdot {\boldsymbol{x}} \ge \operatorname{V}(\boldsymbol{\alpha} \cdot \boldsymbol{w})\}$. As a consequence, the complement of $C(\boldsymbol{w})$ is the set of all boostable vectors $\boldsymbol{z}$. This is a special case of a more general result, stated in \ref{['sec:multiclass']}, that holds for any $k \ge 2$ and any multi-objective cost $\boldsymbol{w} : {\mathcal{Y}}^2 \to \mathbb{R}_{\ge 0}^r$. The straight blue line is the boundary of a specific $H(\boldsymbol{\alpha})$: for any $\boldsymbol{z}$ in it, every $(\boldsymbol{w},\boldsymbol{z})$-learner is boostable w.r.t. the cost $\boldsymbol{\alpha} \cdot \boldsymbol{w}$.

Theorems & Definitions (62)

  • Definition 1: $(w, z)$-learner
  • Definition 2: Random guess
  • Theorem A: Cost-sensitive boosting, binary case
  • Definition 3: $(\boldsymbol{w}, \boldsymbol{z})$-learner
  • Definition 4: Coin attainability
  • Definition 5: Trivial learner
  • Theorem B: Multi-objective boosting, binary case
  • Theorem 6: Equivalence of learners
  • Theorem 7: Equivalence of trivial guarantees
  • Theorem C: Cost-sensitive boosting, multiclass case
  • ...and 52 more