Table of Contents
Fetching ...

Neural Parameter Estimation with Incomplete Data

Matthew Sainsbury-Dale, Andrew Zammit-Mangion, Noel Cressie, Raphaël Huser

TL;DR

This research addresses a prototypical problem that asks how improvements could be made in AI by introducing Bayesian statistical thinking by comparing the two approaches to missingness using simulated incomplete data from a variety of spatial models.

Abstract

Advances in artificial intelligence (AI) and deep learning have led to neural networks being used to generate lightning-speed answers to complex science questions, paintings in the style of Monet, or stories like those of Twain. Leveraging their computational speed and flexibility, neural networks are also being used to facilitate fast, likelihood-free statistical inference. However, it is not straightforward to use neural networks with data that for various reasons are incomplete, which precludes their use in many applications. A recently proposed approach to remedy this issue uses an appropriately padded data vector and a vector that encodes the missingness pattern as input to a neural network. While computationally efficient, this "masking" approach is not robust to the missingness mechanism and can result in statistically inefficient inferences. Here, we propose an alternative approach that is based on the Monte Carlo expectation-maximization (EM) algorithm. Our EM approach is likelihood-free, substantially faster than the conventional EM algorithm as it does not require numerical optimization at each iteration, and more statistically efficient than the masking approach. This research addresses a prototypical problem that asks how improvements could be made in AI by introducing Bayesian statistical thinking. We compare the two approaches to missingness using simulated incomplete data from a variety of spatial models. The utility of the methodology is shown on Arctic sea-ice data, analyzed using a novel hidden Potts model with an intractable likelihood.

Neural Parameter Estimation with Incomplete Data

TL;DR

This research addresses a prototypical problem that asks how improvements could be made in AI by introducing Bayesian statistical thinking by comparing the two approaches to missingness using simulated incomplete data from a variety of spatial models.

Abstract

Advances in artificial intelligence (AI) and deep learning have led to neural networks being used to generate lightning-speed answers to complex science questions, paintings in the style of Monet, or stories like those of Twain. Leveraging their computational speed and flexibility, neural networks are also being used to facilitate fast, likelihood-free statistical inference. However, it is not straightforward to use neural networks with data that for various reasons are incomplete, which precludes their use in many applications. A recently proposed approach to remedy this issue uses an appropriately padded data vector and a vector that encodes the missingness pattern as input to a neural network. While computationally efficient, this "masking" approach is not robust to the missingness mechanism and can result in statistically inefficient inferences. Here, we propose an alternative approach that is based on the Monte Carlo expectation-maximization (EM) algorithm. Our EM approach is likelihood-free, substantially faster than the conventional EM algorithm as it does not require numerical optimization at each iteration, and more statistically efficient than the masking approach. This research addresses a prototypical problem that asks how improvements could be made in AI by introducing Bayesian statistical thinking. We compare the two approaches to missingness using simulated incomplete data from a variety of spatial models. The utility of the methodology is shown on Arctic sea-ice data, analyzed using a novel hidden Potts model with an intractable likelihood.
Paper Structure (26 sections, 2 theorems, 81 equations, 19 figures, 4 tables, 2 algorithms)

This paper contains 26 sections, 2 theorems, 81 equations, 19 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

Let the complete data ${ \hbox{\boldmath${Z}$}} \in \mathbb{R}^n$ be distributed according to a family of probability distributions indexed by ${ \hbox{\boldmath${\theta}$}}$. Partition ${ \hbox{\boldmath${Z}$}}$ into the components ${ \hbox{\boldmath${Z}$}}_1$ and ${ \hbox{\boldmath${Z}$}}_2$ is a sufficient statistic for ${ \hbox{\boldmath${\theta}$}}$.

Figures (19)

  • Figure 1: The Estimation stage of Algorithm \ref{['alg:one-hot']}. Observed data ${ \hbox{\boldmath${Z}$}}_1$, and the associated indices $\mathcal{I}_1$ (here, implicit) that identify which elements of ${ \hbox{\boldmath${Z}$}}$ are observed, are used to construct ${ \hbox{\boldmath${U}$}}$, a masked version of the complete data ${ \hbox{\boldmath${Z}$}}$ with missing entries replaced by a constant $c \in \mathbb{R}$, and ${ \hbox{\boldmath${W}$}}$, a vector of indicator variables that encode the missingness pattern. The encoded data ${ \hbox{\boldmath${U}$}}$ and ${ \hbox{\boldmath${W}$}}$ are then input to an NBE to obtain point estimates $\hat{{ \hbox{\boldmath${\theta}$}}}$ of a model parameter ${ \hbox{\boldmath${\theta}$}}$.
  • Figure 2: The Estimation stage of Algorithm \ref{['alg:neuralEM']}. Incomplete data ${ \hbox{\boldmath${Z}$}}_1$ with missing entries are completed by conditional simulation using the previous parameter estimate $\hat{{ \hbox{\boldmath${\theta}$}}}^{(l-1)}$ of model parameter ${ \hbox{\boldmath${\theta}$}}$. The $m$ conditionally-independent replicates are then input to an NBE trained to approximate the MAP estimator. The parameter estimate $\hat{{ \hbox{\boldmath${\theta}$}}}^{(l)}$ is then used for conditional simulation in the next iteration of the algorithm.
  • Figure 3: Spatial data (first column) where the missingness is of type MCAR (first row) or MICB (second row) with missingness shown in gray, and corresponding empirical distributions (second and third columns) for three estimators of the parameters of the Gaussian process model (Section \ref{['sec:GP']}). True parameter values are shown as a dashed vertical line.
  • Figure 4: Spatial data (first column) simulated from the hidden Potts model of Section \ref{['sec:Potts']}, where the missingness is of type MCAR (first row) or MICB (second row); empirical distributions (second column) for two estimators of the parameter $\beta$, with $\beta = 0.8$ fixed (dashed vertical line); and estimates versus true values (third column) for many different values of $\beta$, with the critical parameter value $\beta_c = 1.005$ demarcated by a dashed vertical line.
  • Figure 5: Arctic sea-ice data from the first day of September for the years 1979, 1993, 1995, and 2023. Faint gray lines denote coastlines, with Greenland appearing at the bottom. The data are subject to both random sources of missingness (e.g., cloud cover) and more systematic sources of missingness due to remote-sensing limitations (e.g., the Arctic Pole Hole).
  • ...and 14 more figures

Theorems & Definitions (5)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • proof