Deep Neural Networks Tend To Extrapolate Predictably
Katie Kang, Amrith Setlur, Claire Tomlin, Sergey Levine
TL;DR
This work revisits how neural networks extrapolate under out-of-distribution conditions, revealing a robust tendency to revert toward an optimal constant solution (OCS) that minimizes training loss without depending on inputs. It combines extensive experiments across vision and language, diverse losses, and architectures with empirical and theoretical analyses (including deep homogeneous ReLU networks) to explain why OOD representations shrink and outputs become input-agnostic. The authors demonstrate that this reversion can be harnessed for risk-sensitive decision-making, notably in selective classification, by aligning the OCS with desired cautious behavior. While the phenomenon is pervasive, the paper also discusses limitations (e.g., adversarial cases) and outlines directions for further research and practical safeguards.
Abstract
Conventional wisdom suggests that neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs. Our work reassesses this assumption for neural networks with high-dimensional inputs. Rather than extrapolating in arbitrary ways, we observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD. Moreover, we find that this value often closely approximates the optimal constant solution (OCS), i.e., the prediction that minimizes the average loss over the training data without observing the input. We present results showing this phenomenon across 8 datasets with different distributional shifts (including CIFAR10-C and ImageNet-R, S), different loss functions (cross entropy, MSE, and Gaussian NLL), and different architectures (CNNs and transformers). Furthermore, we present an explanation for this behavior, which we first validate empirically and then study theoretically in a simplified setting involving deep homogeneous networks with ReLU activations. Finally, we show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
