Clusterization in D-optimal designs: the case against linearization

Yair Daon

Clusterization in D-optimal designs: the case against linearization

Yair Daon

TL;DR

The paper analyzes Bayesian D-optimal designs for linear inverse problems in Hilbert spaces and shows that measurement clusterization is a generic outcome when there is no model error and the prior covariance has rapidly decaying eigenvalues. It provides a tractable analytic framework, proves that adding model error mitigates clustering, and characterizes D-optimal designs as focusing uncertainty reduction onto a small set of leading prior-eigenvectors. Through a nonlinear eigenvalue perspective and Carathéodory-type arguments, it explains clustering via the pigeonhole principle and offers convergence guarantees for posterior uncertainty. The work also demonstrates, with a toy 1D heat equation and numerical experiments, how clusterization arises and how correlated errors can alleviate it, highlighting implications for prior choice and model linearization in optimal design practice.

Abstract

Estimation of parameters in physical processes often demands costly measurements, prompting the pursuit of an optimal measurement strategy. Finding such strategy is termed the problem of optimal experimental design, abbreviated as optimal design. Remarkably, optimal designs can yield tightly clustered measurement locations, leading researchers to fundamentally revise the design problem just to circumvent this issue. Some authors introduce error correlation among error terms that are initially independent, while others restrict measurement locations to a finite set of locations. While both approaches may prevent clusterization, they also fundamentally alter the optimal design problem. In this study, we consider Bayesian D-optimal designs, i.e.~designs that maximize the expected Kullback-Leibler divergence between posterior and prior. We propose an analytically tractable model for D-optimal designs over Hilbert spaces. In this framework, we make several key contributions: (a) We establish that measurement clusterization is a generic trait of D-optimal designs for linear inverse problems with independent Gaussian measurement errors and a Gaussian prior. (b) We prove that introducing correlations among measurement error terms mitigates clusterization. (c) We characterize D-optimal designs as reducing uncertainty across a subset of prior covariance eigenvectors. (d) We leverage this characterization to argue that measurement clusterization arises as a consequence of the pigeonhole principle: when more measurements are taken than there are locations where the select eigenvectors are large and others are small -- clusterization occurs. Finally, we use our analysis to argue against the use of Gaussian priors with linearized physical models when seeking a D-optimal design.

Clusterization in D-optimal designs: the case against linearization

TL;DR

Abstract

Clusterization in D-optimal designs: the case against linearization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (22)