Table of Contents
Fetching ...

Conformalized Method for Empirical Bayes Normal Mean Inference Problem with Heteroscedastic Variance

Kwangok Seo, Johan Lim

Abstract

We study the normal mean inference problem, which involves simultaneous testing of the means of many normal distributions. This problem has been extensively studied within the empirical Bayes (EB) framework. However, the reliability of most EB methods heavily depends on two key conditions: (i) the prior distribution is correctly specified, and (ii) it can be accurately estimated. In practice, both conditions are difficult to satisfy, and it is often unclear whether they hold in a given application. To overcome these limitations, we propose a new algorithm, called COIN (COnformal Inference for Normal mean inference problem). Unlike traditional empirical Bayes approaches, COIN produces decision rules whose validity does not depend on the correct specification or accurate estimation of the prior. We theoretically prove that COIN asymptotically controls the false discovery rate at the nominal level, even in the presence of prior misspecification or estimation errors. Since the COIN algorithm requires an external training dataset to estimate the prior distribution and conformity score function, we introduce two data-splitting strategies -- sample-splitting and feature-splitting -- for the case where such external data are unavailable. We provide theoretical guarantees for the data-splitting strategies and demonstrate their effectiveness through extensive numerical studies and three real data examples.

Conformalized Method for Empirical Bayes Normal Mean Inference Problem with Heteroscedastic Variance

Abstract

We study the normal mean inference problem, which involves simultaneous testing of the means of many normal distributions. This problem has been extensively studied within the empirical Bayes (EB) framework. However, the reliability of most EB methods heavily depends on two key conditions: (i) the prior distribution is correctly specified, and (ii) it can be accurately estimated. In practice, both conditions are difficult to satisfy, and it is often unclear whether they hold in a given application. To overcome these limitations, we propose a new algorithm, called COIN (COnformal Inference for Normal mean inference problem). Unlike traditional empirical Bayes approaches, COIN produces decision rules whose validity does not depend on the correct specification or accurate estimation of the prior. We theoretically prove that COIN asymptotically controls the false discovery rate at the nominal level, even in the presence of prior misspecification or estimation errors. Since the COIN algorithm requires an external training dataset to estimate the prior distribution and conformity score function, we introduce two data-splitting strategies -- sample-splitting and feature-splitting -- for the case where such external data are unavailable. We provide theoretical guarantees for the data-splitting strategies and demonstrate their effectiveness through extensive numerical studies and three real data examples.

Paper Structure

This paper contains 60 sections, 14 theorems, 147 equations, 13 figures, 1 table, 3 algorithms.

Key Result

Theorem 1

Suppose that the hierarchical model described in hierar_model and Assumption assum_1 hold. Also assume that $\nu \geq 2$ and that there are no ties between $u_i$ and $\tilde{u}_i$ for all $i \in \mathcal{H}^{\mathcal{D}_1}$. Then, the FDR of the decision rule $\boldsymbol{\delta}$, obtained from COI where $\alpha$ denotes a pre-specified FDR level. $\blacktriangleleft$$\blacktriangleleft$

Figures (13)

  • Figure 1: False discovery rates (FDRs) and true positive rates (TPRs) under Scenarios 1 and 2. Each panel compares the four methods LS, MixTwice, gg-Mix, and COIN-FS across varying non-null proportions $\pi$, with results averaged over 200 simulation replicates. The dashed horizontal line in the FDR panels indicates the target FDR level 0.1.
  • Figure 2: False discovery rates (FDRs) of five multiple testing methods---LS, MixTwice, gg-Mix, COIN-SS, and COIN-FS---are evaluated under various prior specifications. In all simulation settings, $\mu_i$ and $\sigma_i^2$ are assumed to be independent. Each row corresponds to a different distribution for the variance $\sigma^2$, while each column represents a different distribution for the non-zero effect $\mu$. The x-axis indicates the proportion of non-null hypotheses $\pi$, and each point represents the average over 200 replications. The dashed horizontal line marks the target FDR level of 0.1.
  • Figure 3: True positive rates (TPRs) of the five multiple testing methods. The simulation setup and layout are identical to those in Figure \ref{['fig:sim1_fdr']}.
  • Figure 4: False discovery rates (FDRs) of five multiple testing methods---LS, MixTwice, gg-Mix, COIN-SS, and COIN-FS---are evaluated under various prior specifications. In all simulation settings, $\mu_i$ and $\sigma_i^2$ are assumed to be dependent. Each row corresponds to a different distribution for the variance $\sigma^2$, while each column represents a different distribution for the non-zero effect $\mu$. The x-axis indicates the proportion of non-null hypotheses $\pi$, and each point represents the average over 200 replications. The dashed horizontal line marks the target FDR level of 0.1.
  • Figure 5: True positive rates (TPRs) of five multiple testing methods. The simulation setup and layout are identical to those in Figure \ref{['fig:sim2_fdr']}.
  • ...and 8 more figures

Theorems & Definitions (39)

  • Remark 1: Availability of the External Training Dataset $\mathcal{D}_2$
  • Remark 2: Hierarchical Model
  • Remark 3: Practical Implementation of the NPMLE
  • Remark 4
  • Remark 5: Data-Adaptive Threshold $\tau$
  • Remark 6
  • Theorem 1
  • Remark 7
  • Remark 8
  • Remark 9
  • ...and 29 more