Robust Bayesian Inference for Measurement Error Misspecification: The Berkson and Classical Cases
Charita Dellaporta, Theodoros Damoulas
TL;DR
The paper addresses measurement error in covariates, which can bias parameter inference, by introducing Robust-MEM, a Bayesian nonparametric learning framework that remains robust to misspecifications of the ME distribution and does not require replicate measurements. It builds a Dirichlet Process prior on the conditional distribution $\\mathbb{P}_{X|W}$ to accommodate both Berkson and Classical ME, and offers two loss-based instantiations: Total Least Squares (TLS) for Gaussian, linear settings and Maximum Mean Discrepancy (MMD) for flexible, non-Gaussian regression. The authors provide generalisation error bounds under MMD, discuss prior elicitation via the concentration parameter $c$, and demonstrate robustness and practical performance on synthetic data and real ME problems (California test scores and mental health study). The framework is scalable through Posterior Bootstrap, supports DP approximations without sacrificing asymptotic rates, and can incorporate additional information such as replicates or instrumental variables when available, making it a versatile tool for ME problems in applied statistics and econometrics.
Abstract
Measurement error occurs when a covariate influencing a response variable is corrupted by noise. This can lead to misleading inference outcomes, particularly in problems where accurately estimating the relationship between covariates and response variables is crucial, such as causal effect estimation. Existing methods for dealing with measurement error often rely on strong assumptions such as knowledge of the error distribution or its variance and availability of replicated measurements of the covariates. We propose a Bayesian Nonparametric Learning framework that is robust to misspecification of these assumptions and does not require replicate measurements. This approach gives rise to a general framework that is suitable for both Classical and Berkson error models via the appropriate specification of the prior centering measure of a Dirichlet Process (DP). Moreover, it offers flexibility in the choice of loss function depending on the type of regression model. We provide bounds on the generalisation error based on the Maximum Mean Discrepancy (MMD) loss which allows for generalisation to non-Gaussian distributed errors and nonlinear covariate-response relationships. We showcase the effectiveness of the proposed framework versus prior art in real-world problems containing either Berkson or Classical measurement errors.
