Gini Score under Ties and Case Weights
Alexej Brauer, Mario V. Wüthrich
TL;DR
This paper addresses how to compute and interpret the Gini score for real-valued responses when ties occur and case weights (exposures) are present. It develops a robust framework based on Lorenz curves and the cumulative accuracy profile (CAP), introducing a mid-CAP approach to handle ties without bias from granularity differences, and extending the construction to weighted observations via exposure weights. Theoretical results clarify the relationship between CAP and Lorenz-based Gini, and practical guidance is provided through a real data example (MTPL) that demonstrates how weighting can alter risk-ranking assessments. The work offers concrete implementations and emphasizes fair model comparison, ensuring that Gini-based selection reflects true ranking concordance rather than artefacts of sampling or aggregation.
Abstract
The Gini score is a popular tool in statistical modeling and machine learning for model validation and model selection. It is a purely rank based score that allows one to assess risk rankings. The Gini score for statistical modeling has mainly been used in a binary context, in which it has many equivalent reformulations such as the receiver operating characteristic (ROC) or the area under the curve (AUC). In the actuarial literature, this rank based score for binary responses has been extended to general real-valued random variables using Lorenz curves and concentration curves. While these initial concepts assume that the risk ranking is generated by a continuous distribution function, we discuss in this paper how the Gini score can be used in the case of ties in the risk ranking. Moreover, we adapt the Gini score to the common actuarial situation of having case weights.
