Table of Contents
Fetching ...

Gini Score under Ties and Case Weights

Alexej Brauer, Mario V. Wüthrich

TL;DR

This paper addresses how to compute and interpret the Gini score for real-valued responses when ties occur and case weights (exposures) are present. It develops a robust framework based on Lorenz curves and the cumulative accuracy profile (CAP), introducing a mid-CAP approach to handle ties without bias from granularity differences, and extending the construction to weighted observations via exposure weights. Theoretical results clarify the relationship between CAP and Lorenz-based Gini, and practical guidance is provided through a real data example (MTPL) that demonstrates how weighting can alter risk-ranking assessments. The work offers concrete implementations and emphasizes fair model comparison, ensuring that Gini-based selection reflects true ranking concordance rather than artefacts of sampling or aggregation.

Abstract

The Gini score is a popular tool in statistical modeling and machine learning for model validation and model selection. It is a purely rank based score that allows one to assess risk rankings. The Gini score for statistical modeling has mainly been used in a binary context, in which it has many equivalent reformulations such as the receiver operating characteristic (ROC) or the area under the curve (AUC). In the actuarial literature, this rank based score for binary responses has been extended to general real-valued random variables using Lorenz curves and concentration curves. While these initial concepts assume that the risk ranking is generated by a continuous distribution function, we discuss in this paper how the Gini score can be used in the case of ties in the risk ranking. Moreover, we adapt the Gini score to the common actuarial situation of having case weights.

Gini Score under Ties and Case Weights

TL;DR

This paper addresses how to compute and interpret the Gini score for real-valued responses when ties occur and case weights (exposures) are present. It develops a robust framework based on Lorenz curves and the cumulative accuracy profile (CAP), introducing a mid-CAP approach to handle ties without bias from granularity differences, and extending the construction to weighted observations via exposure weights. Theoretical results clarify the relationship between CAP and Lorenz-based Gini, and practical guidance is provided through a real data example (MTPL) that demonstrates how weighting can alter risk-ranking assessments. The work offers concrete implementations and emphasizes fair model comparison, ensuring that Gini-based selection reflects true ranking concordance rather than artefacts of sampling or aggregation.

Abstract

The Gini score is a popular tool in statistical modeling and machine learning for model validation and model selection. It is a purely rank based score that allows one to assess risk rankings. The Gini score for statistical modeling has mainly been used in a binary context, in which it has many equivalent reformulations such as the receiver operating characteristic (ROC) or the area under the curve (AUC). In the actuarial literature, this rank based score for binary responses has been extended to general real-valued random variables using Lorenz curves and concentration curves. While these initial concepts assume that the risk ranking is generated by a continuous distribution function, we discuss in this paper how the Gini score can be used in the case of ties in the risk ranking. Moreover, we adapt the Gini score to the common actuarial situation of having case weights.

Paper Structure

This paper contains 13 sections, 2 theorems, 56 equations, 7 figures, 1 table.

Key Result

Lemma 3.3

Under Assumptions model assumptions we have for all $\alpha \in [0,1]$ Assume $g$ is a strictly increasing function. Then, $C_{Y, g(\widehat{\mu})}(\alpha)=C_{Y, \widehat{\mu}}(\alpha)$ for all $\alpha\in [0,1]$.

Figures (7)

  • Figure 1: Lorenz curve: (lhs) log-normal case with $\sigma=1$; (rhs) discrete example.
  • Figure 2: Empirical Lorenz curve for sample sizes $n=10,30$: (lhs) empirical log-normal case; (rhs) empirical discrete case.
  • Figure 3: Modified (linearly interpolated) empirical Lorenz curve $\widehat{L}^+_n$ for sample sizes $n=10, 30$: (lhs) log-normal case; (rhs) discrete case.
  • Figure 4: Modified empirical Lorenz curve $\widehat{L}^+_n$ in the discrete case: (lhs) constructed on the aggregated order statistics $(Y^\star_{(k)})_{k=1}^K$ with corner set ${\cal B}^\star$, and (rhs) on the non-aggregated order statistics $(Y_{(i)})_{i=1}^n$ with corner set ${\cal B}$; the two red areas are identical.
  • Figure 5: Modified empirical Lorenz curve $\widehat{L}^+_n$ showing the area $B$ that is enclosed in the convex set between the diagonal dotted line and the modified empirical Lorenz curve $\widehat{L}^+_n$ for sample sizes $n=10,30$: (lhs) log-normal case, (rhs) discrete case.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Example 2.1: continuous example
  • Example 2.2: discrete example
  • Example 2.3: continuous example: empirical version
  • Example 2.4: discrete example: empirical version
  • Lemma 3.3
  • Remark 3.4
  • Lemma 3.6
  • Example 3.7: Gini score under ties