Table of Contents
Fetching ...

Quantiles and Quantile Regression on Riemannian Manifolds: a measure-transportation-based approach

Marc Hallin, Hang Liu

Abstract

Increased attention has been given recently to the statistical analysis of variables with values on nonlinear manifolds. A natural but nontrivial problem in that context is the definition of quantile concepts. We are proposing a solution for compact Riemannian manifolds without boundaries; typical examples are polyspheres, hyperspheres, and toroïdal manifolds equipped with their Riemannian metrics. Our concept of quantile function comes along with a concept of distribution function and, in the empirical case, ranks and signs. The absence of a canonical ordering is offset by resorting to the data-driven ordering induced by optimal transports. Theoretical properties, such as the uniform convergence of the empirical distribution and conditional (and unconditional) quantile functions and distribution-freeness of ranks and signs, are established. Statistical inference applications, from goodness-of-fit to distribution-free rank-based testing, are without number. Of particular importance is the case of quantile regression with directional or toroïdal multiple output, which is given special attention in this paper. Extensive simulations are carried out to illustrate these novel concepts.

Quantiles and Quantile Regression on Riemannian Manifolds: a measure-transportation-based approach

Abstract

Increased attention has been given recently to the statistical analysis of variables with values on nonlinear manifolds. A natural but nontrivial problem in that context is the definition of quantile concepts. We are proposing a solution for compact Riemannian manifolds without boundaries; typical examples are polyspheres, hyperspheres, and toroïdal manifolds equipped with their Riemannian metrics. Our concept of quantile function comes along with a concept of distribution function and, in the empirical case, ranks and signs. The absence of a canonical ordering is offset by resorting to the data-driven ordering induced by optimal transports. Theoretical properties, such as the uniform convergence of the empirical distribution and conditional (and unconditional) quantile functions and distribution-freeness of ranks and signs, are established. Statistical inference applications, from goodness-of-fit to distribution-free rank-based testing, are without number. Of particular importance is the case of quantile regression with directional or toroïdal multiple output, which is given special attention in this paper. Extensive simulations are carried out to illustrate these novel concepts.

Paper Structure

This paper contains 27 sections, 17 theorems, 98 equations, 10 figures.

Key Result

Proposition 1.1

Let ${\rm P}_1 \in \mathfrak{P}$ and ${\rm P}_2$ be two probability measures on $\mathcal{M}$. Then, letting $c(\mathbf{y}, \mathbf{z}) = d^2(\mathbf{y}, \mathbf{z})/2$, If, moreover, ${\rm P}_2 \in \mathfrak{P}$, then

Figures (10)

  • Figure 1: Plots of ${\mathfrak{G}}^{(n)}_{\widehat{\cal M}_0^{(n)}}$, $n=121$ for the 2-torus ${\cal T}^2$ (flat square representation in the first col umn, ${\mathbb R}^3$-embedded representation in the second column), and the 2-sphere ${\cal S}^2$ (third column). The purple points represent the $n_0$ points on ${\widehat{\cal M}_0^{(n)}}$ and the $n_R=3$ contours ${\mathcal{C}}^{\rm U}_{\widehat{\cal M}^{(n)}_0} (r/(n_R + 1))$ for $r = 1, 2, 3$ are shown in red, blue and, green, respectively. The first row corresponds to the case that ${\cal M}_0$ is a singleton, where we set $n_0=1$, $n_R = 3$, and $n_S=40$. The second row is an illustration of the case that ${\cal M}_0$ is of dimension $(p-1)$, where we set $n_0=13$, $n_R = 3$, and $n_S=36$, and ${\widehat{\cal M}^{(n)}_0}$ is shown as the purple dashed line/loop.
  • Figure 2: Empirical quantile contours ${\mathcal{C}}^{(n)} (r/(n_R + 1))$ ($r = 0, 5, 10, 20, 28$), with ${\cal M}_0$ being a singleton, for the 2-torus $\mathcal{T}^2$ (flat square representations in the left panels, ${\mathbb R}^3$-embedded representations in the central panels) and the 2-sphere ${\cal S}^2$ (right panels), computed from $n=~\!4001$ i.i.d. observations with distributions (T1) (left and central top panels), (T2) (left and central middle panels), (T3) (left and central bottom panels), (S1) (right top panel), (S2) (middle right panel), and (S3) (bottom right panel); $n_0=1$, $n_R = 40$, and $n_S=100$.
  • Figure 3: Empirical quantile contours ${\mathcal{C}}^{(n)} (r/(n_R + 1))$ ($r = 0, 5, 9, 12, 16$), with ${\cal M}_0$ of dimension $(p-1)$, for the 2-torus $\mathcal{T}^2$ (flat square representations in the left panels, ${\mathbb R}^3$-embedded representations in the central panels) and the 2-sphere ${\cal S}^2$ (right panels), computed from $n=4001$ i.i.d. observations with distributions (Ta) (left and central top panels), (Tb) (left and central middle panels), (Tc) (left and central bottom panels), (Sa) (right top panel), (Sb) (middle right panel), and (Sc) (bottom right panel); $n_0=41$, $n_R = 20$, and $n_S=198$.
  • Figure 4: Plots of empirical (points) and popluation (solid lines) conditional quantile contours when ${\cal M}_0(\mathbf{x})$ is a singleton. 1st-2nd rows: (TS1) and (SS1) distributions; 3rd-4th rows: (TS2) and (SS2) distributions. $k$-NN (1st and 3rd rows) and Gaussian kernel (2nd and 4th rows) weights are used; $n=10000$, $N=2001$, $N_0=1$, $N_R = 20$, $N_S=100$. Quantile levels: $r/({N_R+1})$, $r=5$ (red), 8 (blue), 12 (green), and 16 (orange); $\mathbf{x}=~\!(0.75, 0.65, \sqrt{0.015})^\top$ for $\mathbf{Y} \in \mathcal{T}^2$ and $\mathbf{x}=~\!(0.6, 0.8)^\top$ for $\mathbf{Y} \in \mathcal{S}^2$.
  • Figure 5: Plots of strip-type empirical (points) and population (solid lines) conditional quantile contours for (TS3) when ${\cal M}_0(\mathbf{x})$ is of dimension $(p-1)$. Left column: $k$-NN weights; right column: Gaussian kernel weights. $n=10000$, $N=2024$, $N_0=24$, $N_R = 20$, $N_S=100$ and $\mathbf{x}=~\!(0.75, 0.65, \sqrt{0.015})^\top$. Quantile levels $r/({N_R+1})$, $r=0$ (purple), 5 (red), 8 (blue), 12 (green), and 16 (orange).
  • ...and 5 more figures

Theorems & Definitions (40)

  • Definition 1.1
  • Proposition 1.1
  • Definition 2.1
  • Remark 2.1
  • Example 2.1
  • Example 2.2
  • Example 2.3
  • Proposition 2.1
  • Definition 2.2
  • Proposition 2.2
  • ...and 30 more