Table of Contents
Fetching ...

Consistency Conditions for Differentiable Surrogate Losses

Drona Khurana, Anish Thilagar, Dhamma Kimpara, Rafael Frongillo

TL;DR

This work analyzes the statistical consistency of differentiable surrogate losses for discrete prediction by connecting calibration to indirect elicitation (IE). It proves that IE is equivalent to calibration in one dimension ($d=1$) but not in higher dimensions, motivating a stronger notion—strong indirect elicitation (strong IE)—which is easier to verify and, under mild conditions, implies calibration; for surrogates with strongly convex components, strong IE is necessary and sufficient for calibration. The results yield constructive link-function designs and enable efficient construction of 1D calibrated surrogates for orderable targets, including ordinal regression, thereby broadening the toolkit for designing consistent differentiable surrogates beyond polyhedral cases. Overall, the paper provides geometry-driven criteria (IE and strong IE) that simplify calibration verification and guide surrogate design with theoretical guarantees of consistency.

Abstract

The statistical consistency of surrogate losses for discrete prediction tasks is often checked via the condition of calibration. However, directly verifying calibration can be arduous. Recent work shows that for polyhedral surrogates, a less arduous condition, indirect elicitation (IE), is still equivalent to calibration. We give the first results of this type for non-polyhedral surrogates, specifically the class of convex differentiable losses. We first prove that under mild conditions, IE and calibration are equivalent for one-dimensional losses in this class. We construct a counter-example that shows that this equivalence fails in higher dimensions. This motivates the introduction of strong IE, a strengthened form of IE that is equally easy to verify. We establish that strong IE implies calibration for differentiable surrogates and is both necessary and sufficient for strongly convex, differentiable surrogates. Finally, we apply these results to a range of problems to demonstrate the power of IE and strong IE for designing and analyzing consistent differentiable surrogates.

Consistency Conditions for Differentiable Surrogate Losses

TL;DR

This work analyzes the statistical consistency of differentiable surrogate losses for discrete prediction by connecting calibration to indirect elicitation (IE). It proves that IE is equivalent to calibration in one dimension () but not in higher dimensions, motivating a stronger notion—strong indirect elicitation (strong IE)—which is easier to verify and, under mild conditions, implies calibration; for surrogates with strongly convex components, strong IE is necessary and sufficient for calibration. The results yield constructive link-function designs and enable efficient construction of 1D calibrated surrogates for orderable targets, including ordinal regression, thereby broadening the toolkit for designing consistent differentiable surrogates beyond polyhedral cases. Overall, the paper provides geometry-driven criteria (IE and strong IE) that simplify calibration verification and guide surrogate design with theoretical guarantees of consistency.

Abstract

The statistical consistency of surrogate losses for discrete prediction tasks is often checked via the condition of calibration. However, directly verifying calibration can be arduous. Recent work shows that for polyhedral surrogates, a less arduous condition, indirect elicitation (IE), is still equivalent to calibration. We give the first results of this type for non-polyhedral surrogates, specifically the class of convex differentiable losses. We first prove that under mild conditions, IE and calibration are equivalent for one-dimensional losses in this class. We construct a counter-example that shows that this equivalence fails in higher dimensions. This motivates the introduction of strong IE, a strengthened form of IE that is equally easy to verify. We establish that strong IE implies calibration for differentiable surrogates and is both necessary and sufficient for strongly convex, differentiable surrogates. Finally, we apply these results to a range of problems to demonstrate the power of IE and strong IE for designing and analyzing consistent differentiable surrogates.

Paper Structure

This paper contains 22 sections, 50 theorems, 62 equations, 7 figures, 1 table.

Key Result

Theorem 1

Let $L: \mathbb{R} \to \mathbb{R}^{n}$ be a convex, differentiable surrogate that indirectly elicits $\ell$. Under Assumption assumption 1, $L$ is calibrated with respect to $\ell$.

Figures (7)

  • Figure 1: Calibration vs. Indirect Elicitation. Calibration requires surrogate minimizers, as well as all sequences converging to surrogate minimizers link to target minimizers. In general, it is not trivial to choose a universal threshold past which the sequences link as desired. Determining such a threshold requires careful reasoning about the relative positions of surrogate minimizers across different outcome distributions, i.e., $\Gamma(p)$ relative to $\Gamma(q)$ for $q \neq p$. IE is analytically easier to verify, as it only requires that surrogate minimizers link to target minimizers. Moreover, IE can be thought of as a geometric condition on the probability simplex, which can directly lead to design insights (§\ref{['sec:applications']}).
  • Figure 2: The expected loss for two surrogates for abstain loss. Left: For $L_{\text{cusp}}$ at $p=0.5$, it is clear that no link yields calibration for the abstain target (Example \ref{['example:cusp']}). Right: A smoothed version of $L_{\text{cusp}}$ that is calibrated, again depicted at $p=0.5$ (Example \ref{['ex:cusp-smoothed']}, Appendix \ref{['appendix:examples']}).
  • Figure 3: Let $\mathcal{Y} = \{1, 2, 3\}$, $\mathcal{R} = \{1, 2\}$. Three candidate target losses $\ell_1, \ell_2, \ell_3 : \mathcal{R} \to \mathbb{R}^{3}$, that $L_{\text{CE}}$ (Example \ref{['example:IEC-counterexample']}) could be a surrogate for. For each $i \in \{1,2,3\}$, $\ell_i(1) = (1, 1, 1)$. Whereas, $\ell_{1}(2) = (5/2, 5/4, 0)$, $\ell_{2}(2) = (2, 1, 0)$ and $\ell_{3}(2) = (5/3, 5/6, 0)$. The target boundary elicited by $\ell_{1}$ (resp. $\ell_{3}$) is the red line segment joining $q_{1}^{a}$ and $q_{1}^{b}$ (resp. $q_{3}^{a}$ and $q_{3}^{b}$). The target boundary elicited by $\ell_{2}$ is the red line segment joining $p$ and $(0,1,0)$. The level sets of $L_{\text{CE}}$ are the blue points. All level sets of $L_{\text{CE}}$ are single points, barring $\Gamma_{(0,0)}$, which is the entire segment spanning from $p = (1/2, 0, 1/2)$ to $(0, 1/2, 1/2)$ (blue line segment). Left: no IE. The segment level set crosses the target boundary, so $L_{\text{CE}}$ cannot indirectly elicit $\ell_1$. Center: IE. The segment level set does not cross, but just touches the target boundary, so IE holds, however, strong IE does not hold. Right: strong IE. The segment level set lies entirely within the target cell, so strong IE holds. Note: $q_{1}^{a} = (0, 0.8, 0.2), q_{1}^{b} = (0.4, 0, 0.6)$, $q_{3}^{a} = (0.2, 0.8, 0), q_{3}^{b} = (0.6, 0, 0.4)$.
  • Figure 4: Let $\mathcal{Y} = \mathcal{R} = \{1,2,3\}$. The solid-peach colored lines depict the target boundaries elicited by the ordinal regression loss $\ell^{\text{ord}}: \mathcal{R} \to \mathbb{R}^{3}$. The dotted-blue lines depict the level-sets of the surrogate $L_{H}: \mathbb{R} \to \mathbb{R}^{3}$ defined in Example \ref{['example: ordinal-surrogate']}. Since no level-set of $L_{H}: \mathbb{R} \to \mathbb{R}^{3}$ crosses from one target cell to another, IE holds. By Theorem \ref{['thm:IEC-1d']} calibration follows.
  • Figure 5: The figure on the left plots the superprediction set, $\{L(u): u \in \mathbb{R}_+\}$, of a surrogate loss that IEs but is not calibrated. To see non-calibration, notice that there is only one possible link. Fix $p = [0.5, 0.5]^\top$. The optimal loss is achieved by $\mathop{\mathrm{arg\,inf}}\limits_{v \in \{L(u): u \in \mathbb{R}_+\}} \left\langle v, p \right\rangle$. Thus the loss is optimized by $v_\bot$ which must link to the abstain report $\bot$. However, consider the points to the left of $v_\bot$, which link to $+1$. The infimum of the loss over these points for $p$ is equal to the loss of $v_\bot$, thus violating calibration.
  • ...and 2 more figures

Theorems & Definitions (110)

  • Definition 1: Target Property, Elicits, Level Sets
  • Definition 2: Surrogate Property, Elicits, Level Sets
  • Definition 3: Calibration
  • Definition 4: Indirect Elicitation
  • Example 1: Cusp
  • Definition 5: Orderable lambert2011elicitation
  • Theorem 1
  • proof : Proof sketch:
  • Example 2: Counterexample: IE without calibration
  • Definition 6: Strong Indirect Elicitation
  • ...and 100 more