Geometric Insights into Focal Loss: Reducing Curvature for Enhanced Model Calibration
Masanari Kimura, Hiroki Naganuma
TL;DR
The paper addresses the calibration problem in neural networks by analyzing focal loss through a geometric lens. It shows that focal loss effectively reduces loss-surface curvature and frames this as an entropy-constrained optimization problem linked to a Maxwell–Boltzmann posterior, with support from a PAC-Bayes perspective. Empirical results reveal that curvature measures such as the Hessian's maximum eigenvalue and trace decline with higher focal gamma and correlate with improved calibration (lower ECE), and that explicit Hessian-trace regularization further enhances calibration. Overall, the work suggests curvature control as a practical, general mechanism for achieving well-calibrated predictions and motivates curvature-aware design of calibration techniques.
Abstract
The key factor in implementing machine learning algorithms in decision-making situations is not only the accuracy of the model but also its confidence level. The confidence level of a model in a classification problem is often given by the output vector of a softmax function for convenience. However, these values are known to deviate significantly from the actual expected model confidence. This problem is called model calibration and has been studied extensively. One of the simplest techniques to tackle this task is focal loss, a generalization of cross-entropy by introducing one positive parameter. Although many related studies exist because of the simplicity of the idea and its formalization, the theoretical analysis of its behavior is still insufficient. In this study, our objective is to understand the behavior of focal loss by reinterpreting this function geometrically. Our analysis suggests that focal loss reduces the curvature of the loss surface in training the model. This indicates that curvature may be one of the essential factors in achieving model calibration. We design numerical experiments to support this conjecture to reveal the behavior of focal loss and the relationship between calibration performance and curvature.
