Exploring the Impact of Temperature Scaling in Softmax for Classification and Adversarial Robustness
Hao Xuan, Bokai Yang, Xingyu Li
TL;DR
This work investigates the temperature parameter in softmax for image classification, combining gradient analysis with extensive experiments on CNNs and transformers across CIFAR-10/100 and Tiny-ImageNet. It shows that moderate $\tau$ can enhance overall accuracy, while elevated $\tau$ reshapes gradient flow to promote balanced learning and unexpectedly boosts robustness against common corruptions and untargeted PGD attacks, with potential benefits for adversarial training. The findings offer a simple yet impactful hyperparameter for improving performance and security in deep learning, while acknowledging limitations in selecting an optimal $\tau$ automatically.
Abstract
The softmax function is a fundamental component in deep learning. This study delves into the often-overlooked parameter within the softmax function, known as "temperature," providing novel insights into the practical and theoretical aspects of temperature scaling for image classification. Our empirical studies, adopting convolutional neural networks and transformers on multiple benchmark datasets, reveal that moderate temperatures generally introduce better overall performance. Through extensive experiments and rigorous theoretical analysis, we explore the role of temperature scaling in model training and unveil that temperature not only influences learning step size but also shapes the model's optimization direction. Moreover, for the first time, we discover a surprising benefit of elevated temperatures: enhanced model robustness against common corruption, natural perturbation, and non-targeted adversarial attacks like Projected Gradient Descent. We extend our discoveries to adversarial training, demonstrating that, compared to the standard softmax function with the default temperature value, higher temperatures have the potential to enhance adversarial training. The insights of this work open new avenues for improving model performance and security in deep learning applications.
