How to select an objective function using information theory
Timothy O. Hodson, Thomas M. Over, Tyler J. Smith, Lucy M. Marshall
TL;DR
The paper reframes objective-function selection as an information-theoretic problem: choose the objective that minimizes information loss, equivalently maximizing information, by mapping objectives to log-likelihoods and comparing them with an AIC-based criterion. It derives log-likelihood equivalents for common objectives (e.g., MSE, MAE, NSE, MSLE) and demonstrates the comparison using Akaike weights on a large hydrological dataset, including zero-inflated and transformed objectives; importantly, it highlights how improper likelihood choices can inflate information loss and uncertainty. In a benchmark with ≈14 million streamflow observations across 1,385 gages, the zero-inflated log-transformed MSLE (ZMALE) best minimizes information loss (lowest entropy), while traditional metrics like MSE and NSE incur larger penalties. The approach offers a general, principled framework for objective selection in multi-use Earth-system models, with implications for model evaluation, optimization, uncertainty quantification, and data compression, and points to future work on multivariate likelihoods and connections to Bayesian inference.
Abstract
In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the objective function that represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitude represents how strongly we should prefer one objective versus another, and the log of that relation represents the difference in their bit-length, as well as the difference in their uncertainty. In other words, prefer whichever objective minimizes the uncertainty. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty), as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.
