Table of Contents
Fetching ...

Inference for location and height of peaks of a standardized field after selection

Alden Green, Jonathan Taylor

TL;DR

The paper develops a rigorous, post-selection framework for inferring the location and height of peaks in a smooth, standardized random field observed with Gaussian noise. It introduces a two-stage TG-test-based peak detection procedure and then constructs post-selection confidence regions for nearby true peaks, with both conditional and marginal coverage guarantees. Central to the theory is a second-order accurate local expansion of the peak intensity near true peaks via the Kac-Rice formula, which illuminates how selection biases affect height and location and enables precise pivots for inference. To address strong selection, the authors propose randomized peak detection and data carving, showing empirical improvements in coverage and interval lengths. The results yield controlled miscoverage rates (PCMR) and robust inference for peak height and localization, with extensive proofs and simulations validating the theoretical claims.

Abstract

Peak inference concerns the use of local maxima ("peaks") of a noisy random field to detect and localize regions where underlying signal is present. We propose a peak inference method that first subjects observed peaks to a significance test of the null hypothesis that no signal is present, and then uses the peaks that are declared significant to construct post-selectively valid confidence regions for the location and height of nearby true peaks. We analyze the performance of this method in a smooth signal plus constant variance noise model under a high-curvature asymptotic assumption, and prove that it asymptotically controls both the number of false discoveries, and the number of confidence regions that do not contain a true peak, relative to the number of points at which inference is conducted. An important intermediate theoretical result uses the Kac-Rice formula to derive a novel approximation to the intensity function of a point process that counts local maxima, which is second-order accurate under the alternative, nearby high-curvature true peaks.

Inference for location and height of peaks of a standardized field after selection

TL;DR

The paper develops a rigorous, post-selection framework for inferring the location and height of peaks in a smooth, standardized random field observed with Gaussian noise. It introduces a two-stage TG-test-based peak detection procedure and then constructs post-selection confidence regions for nearby true peaks, with both conditional and marginal coverage guarantees. Central to the theory is a second-order accurate local expansion of the peak intensity near true peaks via the Kac-Rice formula, which illuminates how selection biases affect height and location and enables precise pivots for inference. To address strong selection, the authors propose randomized peak detection and data carving, showing empirical improvements in coverage and interval lengths. The results yield controlled miscoverage rates (PCMR) and robust inference for peak height and localization, with extensive proofs and simulations validating the theoretical claims.

Abstract

Peak inference concerns the use of local maxima ("peaks") of a noisy random field to detect and localize regions where underlying signal is present. We propose a peak inference method that first subjects observed peaks to a significance test of the null hypothesis that no signal is present, and then uses the peaks that are declared significant to construct post-selectively valid confidence regions for the location and height of nearby true peaks. We analyze the performance of this method in a smooth signal plus constant variance noise model under a high-curvature asymptotic assumption, and prove that it asymptotically controls both the number of false discoveries, and the number of confidence regions that do not contain a true peak, relative to the number of points at which inference is conducted. An important intermediate theoretical result uses the Kac-Rice formula to derive a novel approximation to the intensity function of a point process that counts local maxima, which is second-order accurate under the alternative, nearby high-curvature true peaks.

Paper Structure

This paper contains 141 sections, 24 theorems, 405 equations, 9 figures, 3 algorithms.

Key Result

Theorem 1

Under Assumptions asmp:signal-holder-asmp:hessian-curvatures, for all $n \in \mathbb{N}$ sufficiently large the following statement holds: at all $t^* \in T^*$, $h \in B_{d}(0,\varepsilon_n), y \in \bar{u}_{t^*} \pm \Delta_n$, where

Figures (9)

  • Figure 1: Illustration of our method for peak detection and localization. The top left, top right and bottom left panels show Steps 1-3 of our method, respectively. The bottom right panel superimposes confidence ellipsoids for location over the true signal, which is sparse with 3 true peaks. Out of $13$ observed peaks (colored grey), $7$ survive the initial pre-thresholding step (black), of which $4$ are declared significant (red). One of these is a false discovery, in the sense of not being within distance $\varepsilon_n$ (defined in Section \ref{['subsec:peak-estimation']}) of any true peak. Out of the remaining $3$ true discoveries, one of the corresponding confidence ellipsoids fails to cover the nearby true peak. Thus the miscoverage proportion for location -- the number of confidence regions that do not cover a true peak, out of the total number of points at which inference is conducted -- is $2/7$.
  • Figure 2: Distribution of candidate quantities for location (top row) and height (bottom row). Different columns corresponds to different thresholds $u$. Details of experimental setup and takeaways are in the main text.
  • Figure 3: Comparing miscoverage and width of non-randomized, carve, and split methods for peak inference. Top two rows correspond to inference for location, bottom two rows to inference for height. "Width" of a confidence ellipse is the length of its largest semi-axis. Details of experimental setup and takeaways are in the main text.
  • Figure 4: Comparing power, miscoverage and width of non-randomized, carve, and split methods for peak inference, in the multipeak experiment described in Section \ref{['subsec:experiment-3']}. Peak heights are evenly spaced between $\mu_{t^*} = 3$ (peak-1) and $\mu_{t^*} = 6$ (peak-9). Top left: example field along with all peaks declared significant by TG test.
  • Figure 5: Distribution of candidate quantities for location (top row) and height (bottom row), when $\mu_0 = 3$. Different columns corresponds to different thresholds $u$. Details of experimental setup and takeaways are in the main text.
  • ...and 4 more figures

Theorems & Definitions (31)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 2
  • Proposition 1
  • Proposition 2
  • Remark 5
  • Proposition 3
  • ...and 21 more