How to select an objective function using information theory

Timothy O. Hodson; Thomas M. Over; Tyler J. Smith; Lucy M. Marshall

How to select an objective function using information theory

Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall

TL;DR

The paper reframes objective-function selection as an information-theoretic problem: choose the objective that minimizes information loss, equivalently maximizing information, by mapping objectives to log-likelihoods and comparing them with an AIC-based criterion. It derives log-likelihood equivalents for common objectives (e.g., MSE, MAE, NSE, MSLE) and demonstrates the comparison using Akaike weights on a large hydrological dataset, including zero-inflated and transformed objectives; importantly, it highlights how improper likelihood choices can inflate information loss and uncertainty. In a benchmark with ≈14 million streamflow observations across 1,385 gages, the zero-inflated log-transformed MSLE (ZMALE) best minimizes information loss (lowest entropy), while traditional metrics like MSE and NSE incur larger penalties. The approach offers a general, principled framework for objective selection in multi-use Earth-system models, with implications for model evaluation, optimization, uncertainty quantification, and data compression, and points to future work on multivariate likelihoods and connections to Bayesian inference.

Abstract

In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.

How to select an objective function using information theory

TL;DR

Abstract

Paper Structure (16 sections, 34 equations, 3 figures, 1 table)

This paper contains 16 sections, 34 equations, 3 figures, 1 table.

Introduction
The Experiment
Uncertainty to Information
Objectives to Log Likelihoods
Overfitting Bias
Akaike Weights
Benchmark Demonstration
Conclusions
Generalized Likelihood Uncertainty Estimation
Relation to Bayesian Inference
Adjusted Expectations
NSE Folk Wisdom
Kling--Gupta Efficiency Log-likelihood
Nash--Sutcliffe Efficiency Log-likelihood
Log-likelihood Derivations
...and 1 more sections

Figures (3)

Figure 1: Normal, Laplace, and uniform error distributions, respectively.
Figure 2: Location-wise correlation in conditional entropy among different objective functions. The higher-entropy objectives (MSPE, U, MSE, NSE) are correlated because they all measure errors in absolute terms; whereas, the lower-entropy metrics (MSLE, MALE, ZMSLE, ZMALE) all measure error in relative terms. The others are between these two extremes. For example, MAE measures error in absolute terms but is less sensitive to outliers, which occur more frequently as the objective diverges from the error distribution.
Figure 3: The absolute error of the entropies of different objectives versus sample size ($n$), where $n$ is the number of observations taken at random from each of the 1,385 streamgages. The error for each sample was taken relative to the mean of the five largest samples. Lower-entropy objectives generally converge faster, except for NSE, which is unusually noisy from uncertainty about $\sigma_o$ (\ref{['appendix_nse']}). Also, note how the objectives on the bottom row appear to have negligible uncertainty relative to those on top. These symptoms, and more, are captured in the Akaike weights (Table \ref{['table1']}).

How to select an objective function using information theory

TL;DR

Abstract

How to select an objective function using information theory

Authors

TL;DR

Abstract

Table of Contents

Figures (3)