Hypothesis Testing in Imaging Inverse Problems
Yiming Xi, Konstantinos Zygalakis, Marcelo Pereyra
TL;DR
The paper tackles semantic hypothesis testing in imaging inverse problems, where the observation $Y$ follows $Y \sim P(A x_\star)$ and hypotheses are formulated in natural language. It introduces a noise-injection measurement splitting scheme to enable replication-like testing on the same data, a CLIP-based test statistic in a shared image-text embedding space, and non-parametric testing with an e-value $E = \exp\{-t(Y_2)\}$ under Markov's inequality, with null/alternative $H_0: \mathbb{E}(E) \le 1$ vs $H_1: \mathbb{E}(E) > 1$. Theoretical analysis for a linearized VLM encoder and Gaussian/ exponential-family noise is provided, along with numerical experiments on image-based phenotyping that show strong Type I error control and improved power over zero-shot CLIP baselines. The approach leverages self-supervised reconstruction and foundation-model priors to reduce reliance on ground-truth data while enabling rigorous quantitative inference in scientific imaging pipelines.
Abstract
This paper proposes a framework for semantic hypothesis testing tailored to imaging inverse problems. Modern imaging methods struggle to support hypothesis testing, a core component of the scientific method that is essential for the rigorous interpretation of experiments and robust interfacing with decision-making processes. There are three main reasons why image-based hypothesis testing is challenging. First, the difficulty of using a single observation to simultaneously reconstruct an image, formulate hypotheses, and quantify their statistical significance. Second, the hypotheses encountered in imaging are mostly of semantic nature, rather than quantitative statements about pixel values. Third, it is challenging to control test error probabilities because the null and alternative distributions are often unknown. Our proposed approach addresses these difficulties by leveraging concepts from self-supervised computational imaging, vision-language models, and non-parametric hypothesis testing with e-values. We demonstrate our proposed framework through numerical experiments related to image-based phenotyping, where we achieve excellent power while robustly controlling Type I errors.
