Table of Contents
Fetching ...

Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models

Ethan Pickering, Themistoklis P. Sapsis

TL;DR

This work introduces Information FOMO, a Bayesian experimental-design–inspired sequential data-selection method that identifies information-rich samples while discarding data that harms model stability, addressing sample-wise double descent in fixed datasets. By coupling model and data, constructing a global surrogate output PDF, and using acquisition values $a(\mathbf{x})=w(\mathbf{x})\sigma^2(\mathbf{x})$ with $w(\mathbf{x})=p_{\mathbf{x}}(\mathbf{x})/p_\mu(\mu)$, the method achieves rapid, data-efficient convergence without traditional train/test/validation splits. The approach is demonstrated with Gaussian process regression on a 1D nonlinear map and with ensemble Deep Neural Networks on a high-dimensional dispersive wave model (MMT), showing reduced mean-squared and log-PDF errors and elimination of double descent while using roughly 1/20 of the data. Key contributions include a model-agnostic, data-driven acquisition strategy that dynamically adjusts the training set, the demonstration of convergence through a global output PDF, and practical guidance on shallow DNN ensembles for efficient uncertainty estimation. The results have broad implications for learning from small or fixed datasets and for compressing information in large-scale regimes without sacrificing predictive reliability.

Abstract

Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset, while ignoring data that is either misleading or brings unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data leads to worse performance and instabilities of the surrogate model, often termed sample-wise ``double descent''. We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.

Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models

TL;DR

This work introduces Information FOMO, a Bayesian experimental-design–inspired sequential data-selection method that identifies information-rich samples while discarding data that harms model stability, addressing sample-wise double descent in fixed datasets. By coupling model and data, constructing a global surrogate output PDF, and using acquisition values with , the method achieves rapid, data-efficient convergence without traditional train/test/validation splits. The approach is demonstrated with Gaussian process regression on a 1D nonlinear map and with ensemble Deep Neural Networks on a high-dimensional dispersive wave model (MMT), showing reduced mean-squared and log-PDF errors and elimination of double descent while using roughly 1/20 of the data. Key contributions include a model-agnostic, data-driven acquisition strategy that dynamically adjusts the training set, the demonstration of convergence through a global output PDF, and practical guidance on shallow DNN ensembles for efficient uncertainty estimation. The results have broad implications for learning from small or fixed datasets and for compressing information in large-scale regimes without sacrificing predictive reliability.

Abstract

Misleading or unnecessary data can have out-sized impacts on the health or accuracy of Machine Learning (ML) models. We present a Bayesian sequential selection method, akin to Bayesian experimental design, that identifies critically important information within a dataset, while ignoring data that is either misleading or brings unnecessary complexity to the surrogate model of choice. Our method improves sample-wise error convergence and eliminates instances where more data leads to worse performance and instabilities of the surrogate model, often termed sample-wise ``double descent''. We find these instabilities are a result of the complexity of the underlying map and linked to extreme events and heavy tails. Our approach has two key features. First, the selection algorithm dynamically couples the chosen model and data. Data is chosen based on its merits towards improving the selected model, rather than being compared strictly against other data. Second, a natural convergence of the method removes the need for dividing the data into training, testing, and validation sets. Instead, the selection metric inherently assesses testing and validation error through global statistics of the model. This ensures that key information is never wasted in testing or validation. The method is applied using both Gaussian process regression and deep neural network surrogate models.
Paper Structure (15 sections, 16 equations, 6 figures, 1 algorithm)

This paper contains 15 sections, 16 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Double Descent does not follow the expected descent of modern ML techniques. Modern ML expects test error to decrease with model complexity, training epochs, and training samples, yet, in practice, the descent is not monotonic.
  • Figure 2: $a)$ the true nonlinear solutions $y=f(x)$, with respect to random variable $x$ for nonlinear coefficients of $L=0,5,20,50$, $b)$ the Gaussian PDF of $x$, and $c)$ the non-Gaussian PDF of response variable $y$ with heavy tails for each nonlinear case.
  • Figure 3: GP FOMO model improves error convergence, is superior to early stopping, and converges without testing or validation data. The mean normalized MSE and log-PDF errors (min and max values shaded) of 100 experiments with randomly chosen data samples ( $a)$ and $c)$ ), and with sequential selections ( $b)$ and $d)$ ) over four nonlinear coefficients $L=0,5,20,50$. $e)$ provides a comparison of the approximated and true solution for the errors denoted in $a)-d)$ and $f)$ is the number of chosen data samples by iteration for 100 independent sequential searches from $b)$ and $d)$.
  • Figure 4: DNN FOMO model eliminates double descent, is superior to early stopping, and converges without testing or validation data. The mean normalized MSE and log-PDF errors (with min and max values shaded) of 25 experiments with randomly chosen data points on a Latin Hypercube ( $a)$ and $c)$ ), and with sequential selections ( $b)$ and $d)$). $e)$ provides a comparison of the predicted and true output distributions for three errors denoted in $a)-d)$ and $f)$ the number of chosen data samples by iteration for 25 independent sequential searches from $b)$ and $d)$.
  • Figure 5: Important and misleading/unnecessary datasets show clear separation. A representative example of the iterative selection process, where data is sequentially acquired. The acquisition front indicates the acquisition score of the 50th optimal point at each iteration. Those above are acquired and those already chosen remain in the training set.
  • ...and 1 more figures