Table of Contents
Fetching ...

Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration

Gyusang Cho, Chan-Hyun Youn

TL;DR

This work tackles neural network miscalibration by proposing Tilt and Average (Tna), a geometric recalibration method that operates on the last-layer weights rather than calibration maps. It generates tilted class vectors through an $n$-dimensional rotation built from Givens rotations, controlled by a mean Rotation over Classes $\text{mRC}$, and then averages multiple tilted weights to maintain accuracy. The approach improves calibration (ECE and AdaECE) while preserving nearly unchanged accuracy and can complement traditional calibration maps; it is demonstrated across CIFAR and ImageNet with multiple architectures, and code is released for reproducibility. Overall, Tna offers a data-efficient, plug-in recalibration option that expands the space of post-hoc calibration techniques by leveraging angular geometry in the final layer.

Abstract

After the revelation that neural networks tend to produce overconfident predictions, the problem of calibration, which aims to align confidence with accuracy to enhance the reliability of predictions, has gained significant importance. Several solutions based on calibration maps have been proposed to address the problem of recalibrating a trained classifier using additional datasets. In this paper, we offer an algorithm that transforms the weights of the last layer of the classifier, distinct from the calibration-map-based approach. We concentrate on the geometry of the final linear layer, specifically its angular aspect, and adjust the weights of the corresponding layer. We name the method Tilt and Average(\textsc{Tna}), and validate the calibration effect empirically and theoretically. Through this, we demonstrate that our approach, in addition to the existing calibration-map-based techniques, can yield improved calibration performance. Code available : https://github.com/GYYYYYUUUUU/TNA_Angular_Scaling.

Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration

TL;DR

This work tackles neural network miscalibration by proposing Tilt and Average (Tna), a geometric recalibration method that operates on the last-layer weights rather than calibration maps. It generates tilted class vectors through an -dimensional rotation built from Givens rotations, controlled by a mean Rotation over Classes , and then averages multiple tilted weights to maintain accuracy. The approach improves calibration (ECE and AdaECE) while preserving nearly unchanged accuracy and can complement traditional calibration maps; it is demonstrated across CIFAR and ImageNet with multiple architectures, and code is released for reproducibility. Overall, Tna offers a data-efficient, plug-in recalibration option that expands the space of post-hoc calibration techniques by leveraging angular geometry in the final layer.

Abstract

After the revelation that neural networks tend to produce overconfident predictions, the problem of calibration, which aims to align confidence with accuracy to enhance the reliability of predictions, has gained significant importance. Several solutions based on calibration maps have been proposed to address the problem of recalibrating a trained classifier using additional datasets. In this paper, we offer an algorithm that transforms the weights of the last layer of the classifier, distinct from the calibration-map-based approach. We concentrate on the geometry of the final linear layer, specifically its angular aspect, and adjust the weights of the corresponding layer. We name the method Tilt and Average(\textsc{Tna}), and validate the calibration effect empirically and theoretically. Through this, we demonstrate that our approach, in addition to the existing calibration-map-based techniques, can yield improved calibration performance. Code available : https://github.com/GYYYYYUUUUU/TNA_Angular_Scaling.
Paper Structure (25 sections, 2 theorems, 14 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 2 theorems, 14 equations, 11 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3.2

(Class-wise Effect of Tilt.) Let there be an original weight $\mathbf{W}$, and rotation matrix R. Also, let $\psi_i$ to be $\angle{(\mathbf{w}_i, \mathbf{z})}$. Suppose the rotation matrix $R$ rotates the $i$-th class vector with $\theta$, $\angle{(\mathbf{w}_i, R\mathbf{w}_i)} = \theta$. We further

Figures (11)

  • Figure 1: Overview of the proposed algorithm. We take the original weight $W$ of the last linear layer(FC-layer), generate multiple "Tilt"ed weights $\mathbf{W}^1, \mathbf{W}^2, \cdots \mathbf{W}^{n_e}$ from the original weight with relaxed confidence, and "Average" the generated weights to compensate possible accuracy loss. The detailed information can be found in \ref{['sec:tilt_angles']}.
  • Figure 2: Plots of mRC corresponding to the number of rotations($n_r$) with different values of $\theta_s$. Each point corresponds to a single tilted weight. When the number of rotations increases, mRC increases as well. Each row states for different dataset-model pair: CIFAR10-WRN28x10(upper), CIFAR100-MobileNetV2(middle), ImageNet-ResNet50(lower). Across datasets and models, the proposed algorithm can generate a rotation matrix with certain mRC, by changing the number of rotations $n_r$.
  • Figure 3: Distribution plot of the data samples with angle between class vector and pf, $\angle(\mathbf{w}_i, \mathbf{z}_x)$, with the corresponding predicted class row vector of the original weight (Orig.) and the tilted weights by depicted angle of $mRC$, 30°, 45°. "False" denotes the angle of $pf$ with the class vector does not correspond to the respective class. As the $mRC$ increases, the angles shift towards 90 °. The distributions of various dataset-architectures can be observed in the appendix, specifically in Fig. \ref{['fig:data_distribution_supp_CF']} and \ref{['fig:data_distribution_supp_IN']}. The visuals are enhanced with colors.
  • Figure 4: Plots of the accuracy of the ensembled outputs correspond to the number of ensemble members, on the ImageNet dataset. The accuracy is well compensated at most of the angles (upper) and as the $n_e$ increases (lower). The red line indicates the performance of the original weight. The performance is averaged over 10 runs. Extended experiments at appendix Fig. \ref{['fig:accuracy_compensation_supp_CF']}
  • Figure 5: The data efficiency plot comparing 3 datasets(CIFAR10, CIFAR100, ImageNet). The architectures of WideResNet28x10, MobileNetv2, and ResNet50 are used respectively, when Tna is applied to the original weight. We demonstrate that Tna is efficient in data, requiring less additional calibration set.
  • ...and 6 more figures

Theorems & Definitions (5)

  • Definition 3.1
  • Theorem 3.2
  • Proposition 3.3
  • proof
  • proof