Table of Contents
Fetching ...

Extended Fiducial Inference: Toward an Automated Process of Statistical Inference

Faming Liang, Sehwan Kim, Yan Sun

TL;DR

EFI reframes statistical inference by treating unknown parameters as fixed yet propagating data uncertainty through an learned inverse mapping $G({\boldsymbol Y},{\boldsymbol X},{\boldsymbol Z})$ via a sparse DNN, while imputing latent errors ${\boldsymbol Z}_n$ with adaptive stochastic gradient MCMC. The framework delivers a conditional fiducial distribution that automates hypothesis testing and parameter estimation without priors, and extends naturally to semi-supervised learning and complex hypotheses. The EFI-DNN algorithm provides a scalable, end-to-end method with convergence guarantees for the learned inverse and latent imputation, yielding robust fidelity to data, especially in the presence of outliers. Across linear and nonlinear models, Behrens–Fisher settings, multivariate norms, SSL tasks, and mediation tests, EFI demonstrates competitive or superior uncertainty quantification, reduced CI widths, and automated inference without relying on asymptotic references. Overall, EFI offers a flexible, data-driven pathway toward automated statistical inference with broad applicability in modern data science.

Abstract

While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set --`inferring the uncertainty of model parameters on the basis of observations' -- has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference by leveraging advanced statistical computing techniques while remaining scalable for big data. EFI involves jointly imputing random errors realized in observations using stochastic gradient Markov chain Monte Carlo and estimating the inverse function using a sparse deep neural network (DNN). The consistency of the sparse DNN estimator ensures that the uncertainty embedded in observations is properly propagated to model parameters through the estimated inverse function, thereby validating downstream statistical inference. Compared to frequentist and Bayesian methods, EFI offers significant advantages in parameter estimation and hypothesis testing. Specifically, EFI provides higher fidelity in parameter estimation, especially when outliers are present in the observations; and eliminates the need for theoretical reference distributions in hypothesis testing, thereby automating the statistical inference process. EFI also provides an innovative framework for semi-supervised learning.

Extended Fiducial Inference: Toward an Automated Process of Statistical Inference

TL;DR

EFI reframes statistical inference by treating unknown parameters as fixed yet propagating data uncertainty through an learned inverse mapping via a sparse DNN, while imputing latent errors with adaptive stochastic gradient MCMC. The framework delivers a conditional fiducial distribution that automates hypothesis testing and parameter estimation without priors, and extends naturally to semi-supervised learning and complex hypotheses. The EFI-DNN algorithm provides a scalable, end-to-end method with convergence guarantees for the learned inverse and latent imputation, yielding robust fidelity to data, especially in the presence of outliers. Across linear and nonlinear models, Behrens–Fisher settings, multivariate norms, SSL tasks, and mediation tests, EFI demonstrates competitive or superior uncertainty quantification, reduced CI widths, and automated inference without relying on asymptotic references. Overall, EFI offers a flexible, data-driven pathway toward automated statistical inference with broad applicability in modern data science.

Abstract

While fiducial inference was widely considered a big blunder by R.A. Fisher, the goal he initially set --`inferring the uncertainty of model parameters on the basis of observations' -- has been continually pursued by many statisticians. To this end, we develop a new statistical inference method called extended Fiducial inference (EFI). The new method achieves the goal of fiducial inference by leveraging advanced statistical computing techniques while remaining scalable for big data. EFI involves jointly imputing random errors realized in observations using stochastic gradient Markov chain Monte Carlo and estimating the inverse function using a sparse deep neural network (DNN). The consistency of the sparse DNN estimator ensures that the uncertainty embedded in observations is properly propagated to model parameters through the estimated inverse function, thereby validating downstream statistical inference. Compared to frequentist and Bayesian methods, EFI offers significant advantages in parameter estimation and hypothesis testing. Specifically, EFI provides higher fidelity in parameter estimation, especially when outliers are present in the observations; and eliminates the need for theoretical reference distributions in hypothesis testing, thereby automating the statistical inference process. EFI also provides an innovative framework for semi-supervised learning.
Paper Structure (55 sections, 14 theorems, 101 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 55 sections, 14 theorems, 101 equations, 8 figures, 10 tables, 1 algorithm.

Key Result

Lemma 3.1

If Assumptions ass:existence-ass:zero hold, then the zero-energy set $\mathcal{Z}_n$ is invariant to the choice of $G(\cdot)$.

Figures (8)

  • Figure 1: Illustration for the source of uncertainty of model parameters: the space that ${\boldsymbol \theta}$ can take values becomes smaller and smaller as the sample size increases.
  • Figure 2: Illustration of the EFI network, where the red nodes and links form a DNN (parameterized by the weights ${\boldsymbol w}$) to learn, the green node represents latent variables to impute, and the black lines represent deterministic functions.
  • Figure 3: Results of EFI (with the ReLU activation function) for one dataset simulated from (\ref{['LinearEx1']}) with $n=500$: (left) scatter plot of $\hat{{\boldsymbol z}}_n$ ($y$-axis) versus ${\boldsymbol z}_n$ ($x$-axis), (middle) Q-Q plot of $\hat{{\boldsymbol z}}_n$ and ${\boldsymbol z}_n$, (right) confidence intervals of $\beta_1$ produced by EFI and OLS.
  • Figure 4: Results of EFI for one dataset simulated from (\ref{['BFproblem']}) with $n_1=n_2=50$: (left) Q-Q plot of $\{\hat{\mu}_1^{(k)}: k=1,2,\ldots \mathcal{M}\}$ ($x$-axis) and $\{\tilde{t}_1^{(k)}: k=1,2,\ldots \mathcal{M} \}$ ($y$-axis); (right) Q-Q plot of $\{\hat{\mu}_2^{(k)}: k=1,2,\ldots \mathcal{M}\}$ ($x$-axis) and $\{\tilde{t}_2^{(k)}: k=1,2,\ldots \mathcal{M}\}$ ($y$-axis)
  • Figure 5: Fidelity of EFI in parameter estimation: (left) scatter plot of residuals: $z_i$ versus $\hat{z}_i$; (middle left) scatter plot of ordered residuals: $z_{(i)}$ versus $\hat{z}_{(i)}$; (middle right) EFI and OLS confidence intervals for $\beta_1$; (right) EFI and OLS confidence intervals for $\sigma^2$.
  • ...and 3 more figures

Theorems & Definitions (28)

  • Lemma 3.1
  • Theorem 3.1
  • Lemma 3.2
  • Definition 3.1
  • Remark 1
  • Theorem 3.2
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • ...and 18 more