Table of Contents
Fetching ...

Robust Bayesian Inference for Measurement Error Misspecification: The Berkson and Classical Cases

Charita Dellaporta, Theodoros Damoulas

TL;DR

The paper addresses measurement error in covariates, which can bias parameter inference, by introducing Robust-MEM, a Bayesian nonparametric learning framework that remains robust to misspecifications of the ME distribution and does not require replicate measurements. It builds a Dirichlet Process prior on the conditional distribution $\\mathbb{P}_{X|W}$ to accommodate both Berkson and Classical ME, and offers two loss-based instantiations: Total Least Squares (TLS) for Gaussian, linear settings and Maximum Mean Discrepancy (MMD) for flexible, non-Gaussian regression. The authors provide generalisation error bounds under MMD, discuss prior elicitation via the concentration parameter $c$, and demonstrate robustness and practical performance on synthetic data and real ME problems (California test scores and mental health study). The framework is scalable through Posterior Bootstrap, supports DP approximations without sacrificing asymptotic rates, and can incorporate additional information such as replicates or instrumental variables when available, making it a versatile tool for ME problems in applied statistics and econometrics.

Abstract

Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework that is robust to misspecification of these assumptions and does not require replicate measurements. This approach gives rise to a general framework that is suitable for both Classical and Berkson error models via the appropriate specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it offers flexibility in the choice of loss function depending on the type of regression model. We provide bounds on the generalisation error based on the Maximum Mean Discrepancy (MMD) loss which allows for generalisation to non-Gaussian distributed errors and nonlinear covariate-response relationships. We showcase the effectiveness of the proposed framework versus prior art in real-world problems containing either Berkson or Classical measurement errors.

Robust Bayesian Inference for Measurement Error Misspecification: The Berkson and Classical Cases

TL;DR

The paper addresses measurement error in covariates, which can bias parameter inference, by introducing Robust-MEM, a Bayesian nonparametric learning framework that remains robust to misspecifications of the ME distribution and does not require replicate measurements. It builds a Dirichlet Process prior on the conditional distribution to accommodate both Berkson and Classical ME, and offers two loss-based instantiations: Total Least Squares (TLS) for Gaussian, linear settings and Maximum Mean Discrepancy (MMD) for flexible, non-Gaussian regression. The authors provide generalisation error bounds under MMD, discuss prior elicitation via the concentration parameter , and demonstrate robustness and practical performance on synthetic data and real ME problems (California test scores and mental health study). The framework is scalable through Posterior Bootstrap, supports DP approximations without sacrificing asymptotic rates, and can incorporate additional information such as replicates or instrumental variables when available, making it a versatile tool for ME problems in applied statistics and econometrics.

Abstract

Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework that is robust to misspecification of these assumptions and does not require replicate measurements. This approach gives rise to a general framework that is suitable for both Classical and Berkson error models via the appropriate specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it offers flexibility in the choice of loss function depending on the type of regression model. We provide bounds on the generalisation error based on the Maximum Mean Discrepancy (MMD) loss which allows for generalisation to non-Gaussian distributed errors and nonlinear covariate-response relationships. We showcase the effectiveness of the proposed framework versus prior art in real-world problems containing either Berkson or Classical measurement errors.
Paper Structure (52 sections, 16 theorems, 90 equations, 10 figures, 11 tables, 4 algorithms)

This paper contains 52 sections, 16 theorems, 90 equations, 10 figures, 11 tables, 4 algorithms.

Key Result

Theorem 4.1

Let $k^2_X = k_X \otimes k_X$ and $\mathop{\mathrm{\mathbb{P}}}\nolimits^0 = \mathop{\mathrm{\mathbb{P}}}\nolimits_X^{0,n} \times \mathop{\mathrm{\mathbb{P}}}\nolimits_{Y \:\vert\:X}^0$. Then under Assumptions asm:k-asm-lambda and $\Lambda$ as in Assumption asm-lambda we have: where $\mathop{\mathrm{\mathbb{P}}}\nolimits := (\mathop{\mathrm{\mathbb{P}}}\nolimits^1, \dots, \mathop{\mathrm{\mathbb{

Figures (10)

  • Figure 1: Graphical representation of Berkson and Classical ME. Directed arrows indicate conditional dependency. The regression function explaining the relationship of $Y \:\vert\:X$ is denoted by $g(\theta, \cdot)$. The two models differ on the dependency relationships between $N,W$ and $X$.
  • Figure 2: Summary of different Robust-MEM frameworks based on the choices of prior centering measures $\{\mathop{\mathrm{\mathbb{F}}}\nolimits_{w_i}\}_{i=1}^n$ and loss function $l$.
  • Figure 3: Average MSE in polynomial regression with Berkson, Gaussian ME over 50 replications. Left: Robust-MEM method with a well-specified prior. Right: Robust-MEM method with a misspecified (Student-t with 3 degrees of freedom) prior.
  • Figure 4: Fitted linear regression line over 100 simulations with a misspecified ME variance. The true line is shown along with (R-MEM (TLS)), nonlinear LS (Least Squares) and SIMEX.
  • Figure 5: Model fit for the sigmoid curve regression function with Berkson ME for an increasing ME variance over 100 simulations. The true model fit is shown along with our method (R-MEM (MMD)), nonlinear least squares (Least Squares) and minimum MMD estimator (MMD).
  • ...and 5 more figures

Theorems & Definitions (35)

  • Remark 3.1
  • Remark 3.2
  • Definition 4.0.1: Characteristic kernel. sriperumbudur2011universality
  • Definition 4.0.2: Translation invariant kernel.
  • Theorem 4.1: Berkson
  • Theorem 4.2: Classical
  • Corollary 4.2.1: Berkson
  • Corollary 4.2.2: Classical
  • Corollary 4.2.3: Berkson
  • Corollary 4.2.4: Classical
  • ...and 25 more