Table of Contents
Fetching ...

Maximal Ancillarity, Semiparametric Efficiency, and the Elimination of Nuisances

Marc Hallin, Bas J. M. Werker, Bo Zhou

Abstract

Restricting statistical experiments via nuisance-ancillary $σ$-fields yields nuisance-free experiments. However, a moot point with ancillarity is that maximal ancillary $σ$-fields are typically not unique. There are exceptions, though, among which the limiting experiments in a locally asymptotically normal (LAN) context. Building on this, we address the maximal ancillarity uniqueness problem by adopting a Hájek-Le Cam asymptotic perspective and define the concept of sequences of locally asymptotically maximal nuisance-ancillary $σ$-fields. We then show that any semiparametrically efficient procedure admits versions that are measurable with respect to such $σ$-fields while enjoying strict finite-sample nuisance-ancillarity, hence eliminating the nuisance without the hassle of estimating it. This is in sharp contrast with classical tangent space projections, which also achieve semiparametric efficiency but only enjoy asymptotic nuisance-ancillarity -- at the price, moreover, of adequately estimating the nuisance. When the nuisance is the density of some noise or innovation driving the data-generating process of a LAN experiment, we show that a sequence of locally asymptotically maximal nuisance-ancillary $σ$-fields is generated by the so-called center-outward residual ranks and signs based on measure transportation results. Restricting local experiments to such $σ$-fields yields sequences of finite-sample nuisance-free (here, distribution-free) restrictions of the original local LAN experiments that nevertheless achieve the semiparametric efficiency bounds of the original ones.

Maximal Ancillarity, Semiparametric Efficiency, and the Elimination of Nuisances

Abstract

Restricting statistical experiments via nuisance-ancillary -fields yields nuisance-free experiments. However, a moot point with ancillarity is that maximal ancillary -fields are typically not unique. There are exceptions, though, among which the limiting experiments in a locally asymptotically normal (LAN) context. Building on this, we address the maximal ancillarity uniqueness problem by adopting a Hájek-Le Cam asymptotic perspective and define the concept of sequences of locally asymptotically maximal nuisance-ancillary -fields. We then show that any semiparametrically efficient procedure admits versions that are measurable with respect to such -fields while enjoying strict finite-sample nuisance-ancillarity, hence eliminating the nuisance without the hassle of estimating it. This is in sharp contrast with classical tangent space projections, which also achieve semiparametric efficiency but only enjoy asymptotic nuisance-ancillarity -- at the price, moreover, of adequately estimating the nuisance. When the nuisance is the density of some noise or innovation driving the data-generating process of a LAN experiment, we show that a sequence of locally asymptotically maximal nuisance-ancillary -fields is generated by the so-called center-outward residual ranks and signs based on measure transportation results. Restricting local experiments to such -fields yields sequences of finite-sample nuisance-free (here, distribution-free) restrictions of the original local LAN experiments that nevertheless achieve the semiparametric efficiency bounds of the original ones.
Paper Structure (16 sections, 11 theorems, 48 equations, 1 figure)

This paper contains 16 sections, 11 theorems, 48 equations, 1 figure.

Key Result

Lemma 2.1

A sub-$\sigma$-field ${\cal B}^{(n)}_{\boldsymbol\theta}$ of ${\cal B}^{(n)}$ is ${\cal E}^{(n)}_{\boldsymbol\theta , \boldsymbol\vartheta}$-nuisance-ancillary at $\boldsymbol\tau = {\boldsymbol 0}$ if and only if it is ${\cal E}^{(n)}_{\text{\rm global}}$-nuisance-ancillary at $\boldsymbol\theta$.

Figures (1)

  • Figure 1: The grid $\mathfrak G^{(n)}$ for $d=2$ (left-hand panel) and the empirical quantile contours (right-hand panel) of order $\tau =$ 0.146, 0.268, 0.390, 0.634, and 0.878 for a sample of size $n=5,000$ from a mixture of three bivariate Gaussians, with $n_R=40$ and $n_S=125$. The smooth interpolation of ${\bf F}^{(n)}_\pm$ described in Section 3 of hallin2021distribution was implemented; interpolated sign curves are shown in yellow.

Theorems & Definitions (15)

  • Example 1.1
  • Lemma 2.1
  • Proposition 2.1
  • Proposition 2.2
  • Definition 2.1
  • Definition 2.2
  • Lemma 2.2
  • Theorem 2.1
  • Corollary 2.1
  • Proposition 4.1
  • ...and 5 more