Table of Contents
Fetching ...

Detecting relevant dependencies under measurement error with applications to the analysis of planetary system evolution

Patrick Bastian, Nicolai Bissantz

TL;DR

The paper tackles reliable inference on correlations under additive measurement error by developing a deconvolution-based estimator for $U$-statistics and a bootstrap test for relevant correlations exceeding a threshold $\Delta$, complemented by bootstrap confidence intervals. It proves a central limit theorem for the deconvolution estimator and establishes bootstrap validity, with a data-driven mechanism to select a practically meaningful $\Delta$ via $\hat{\Delta}_{\min}$. Through simulations, the method demonstrates good finite-sample performance for Kendall's $\tau$ and Spearman's $\rho$ under various error structures and bandwidth choices. Applied to Hot Jupiters, accounting for measurement error reduces point estimates and yields rejection of $H_0(\Delta)$ only for very small $\Delta$, indicating no practically relevant correlation between stellar activity and planetary surface gravity. Overall, the framework provides reliable, threshold-aware inference under measurement error and cautions against spurious conclusions in astrophysical correlation studies.

Abstract

Exoplanets play an important role in understanding the mechanics of planetary system formation and orbital evolution. In this context the correlations of different parameters of the planets and their host star are useful guides in the search for explanatory mechanisms. Based on a reanalysis of the data set from \cite{figueria14} we study the as of now still poorly understood correlation between planetary surface gravity and stellar activity of Hot Jupiters. Unfortunately, data collection often suffers from measurement errors due to complicated and indirect measurement setups, rendering standard inference techniques unreliable. We present new methods to estimate and test for correlations in a deconvolution framework and thereby improve the state of the art analysis of the data in two directions. First, we are now able to account for additive measurement errors which facilitates reliable inference. Second we test for relevant changes, i.e. we are testing for correlations exceeding a certain threshold $Δ$. This reflects the fact that small nonzero correlations are to be expected for real life data almost always and that standard statistical tests will therefore always reject the null of no correlation given sufficient data. Our theory focuses on quantities that can be estimated by U-Statistics which contain a variety of correlation measures. We propose a bootstrap test and establish its theoretical validity. As a by product we also obtain confidence intervals. Applying our methods to the Hot Jupiter data set from \cite{figueria14}, we observe that taking into account the measurement errors yields smaller point estimates and the null of no relevant correlation is rejected only for very small $Δ$. This demonstrates the importance of considering the impact of measurement errors to avoid misleading conclusions from the resulting statistical analysis.

Detecting relevant dependencies under measurement error with applications to the analysis of planetary system evolution

TL;DR

The paper tackles reliable inference on correlations under additive measurement error by developing a deconvolution-based estimator for -statistics and a bootstrap test for relevant correlations exceeding a threshold , complemented by bootstrap confidence intervals. It proves a central limit theorem for the deconvolution estimator and establishes bootstrap validity, with a data-driven mechanism to select a practically meaningful via . Through simulations, the method demonstrates good finite-sample performance for Kendall's and Spearman's under various error structures and bandwidth choices. Applied to Hot Jupiters, accounting for measurement error reduces point estimates and yields rejection of only for very small , indicating no practically relevant correlation between stellar activity and planetary surface gravity. Overall, the framework provides reliable, threshold-aware inference under measurement error and cautions against spurious conclusions in astrophysical correlation studies.

Abstract

Exoplanets play an important role in understanding the mechanics of planetary system formation and orbital evolution. In this context the correlations of different parameters of the planets and their host star are useful guides in the search for explanatory mechanisms. Based on a reanalysis of the data set from \cite{figueria14} we study the as of now still poorly understood correlation between planetary surface gravity and stellar activity of Hot Jupiters. Unfortunately, data collection often suffers from measurement errors due to complicated and indirect measurement setups, rendering standard inference techniques unreliable. We present new methods to estimate and test for correlations in a deconvolution framework and thereby improve the state of the art analysis of the data in two directions. First, we are now able to account for additive measurement errors which facilitates reliable inference. Second we test for relevant changes, i.e. we are testing for correlations exceeding a certain threshold . This reflects the fact that small nonzero correlations are to be expected for real life data almost always and that standard statistical tests will therefore always reject the null of no correlation given sufficient data. Our theory focuses on quantities that can be estimated by U-Statistics which contain a variety of correlation measures. We propose a bootstrap test and establish its theoretical validity. As a by product we also obtain confidence intervals. Applying our methods to the Hot Jupiter data set from \cite{figueria14}, we observe that taking into account the measurement errors yields smaller point estimates and the null of no relevant correlation is rejected only for very small . This demonstrates the importance of considering the impact of measurement errors to avoid misleading conclusions from the resulting statistical analysis.

Paper Structure

This paper contains 9 sections, 15 theorems, 91 equations, 2 figures, 5 tables.

Key Result

Theorem 2.1

Under assumptions (A1) to (A5) we have that where and $k_y(x)=\int_{\mathbb{R}^2}k(x,y)f(y)dy$.

Figures (2)

  • Figure 1: Histogram of differences between Spearman correlations for bivariate data without and with additive error, where we observe $X_i$, resp. $Z_i =X_i +\varepsilon_i$\ref{['IntroModel']}, with $p=2$, $X$ bivariate normal with correlation$\epsilon_i$ either 0 or a bivariate Laplace distribution with variances equal to $0.05$ and uncorrelated marginals. We sampled 10000 times at sample size 100, calculated the Spearman correlation without and with error, i.e. for $X_i$ resp. $Z_i$, and recorded their difference.
  • Figure 2: Results for Model\ref{['Model2']} and $n=500$. We plot $D_i(h_i)$ and $\widehat{IMSE}(h_i)$ against $h_i$. The other graphs contain heatmaps of the estimated densities for the regularization parameters $\frac{j}{2}h_i^{\rm opt}, j=1,2,3$ where $h_i^{\rm opt}$ minimizes $(D_i)_{i=1,...,m}$.

Theorems & Definitions (29)

  • Theorem 2.1
  • Theorem 2.2
  • Corollary 2.3
  • proof
  • Lemma 5.1
  • proof
  • Lemma 5.2
  • proof
  • Lemma 5.3
  • proof
  • ...and 19 more