Table of Contents
Fetching ...

Fully distribution-free center-outward rank tests for multiple-output regression and MANOVA

Marc Hallin, Daniel Hlubinka, Šárka Hudecová

TL;DR

This work develops fully distribution-free center-outward rank tests for multivariate linear models, including two-sample location and MANOVA, by deriving a Hájek-type representation for linear center-outward rank statistics and proving asymptotic normality. It constructs both elliptical Mahalanobis-based tests and center-outward tests using spherical score functions, with local asymptotic normality established under general densities and elliptical special cases. By selecting appropriate scores, the proposed tests achieve parametric efficiency while remaining valid for all absolutely continuous error densities, and they display superior performance in nonelliptical settings through simulations and a real-data example. The results yield a practical, robust, and efficient toolbox for multivariate inference in regression and MANOVA without strong distributional assumptions.

Abstract

Extending rank-based inference to a multivariate setting such as multiple-output regression or MANOVA with unspecified d-dimensional error density has remained an open problem for more than half a century. None of the many solutions proposed so far is enjoying the combination of distribution-freeness and efficiency that makes rank-based inference a successful tool in the univariate setting. A concept of center-outward multivariate ranks and signs based on measure transportation ideas has been introduced recently. Center-outward ranks and signs are not only distribution-free but achieve in dimension d > 1 the (essential) maximal ancillarity property of traditional univariate ranks, hence carry all the "distribution-free information" available in the sample. We derive here the Hájek representation and asymptotic normality results required in the construction of center-outward rank tests for multiple-output regression and MANOVA. When based on appropriate spherical scores, these fully distribution-free tests achieve parametric efficiency in the corresponding models.

Fully distribution-free center-outward rank tests for multiple-output regression and MANOVA

TL;DR

This work develops fully distribution-free center-outward rank tests for multivariate linear models, including two-sample location and MANOVA, by deriving a Hájek-type representation for linear center-outward rank statistics and proving asymptotic normality. It constructs both elliptical Mahalanobis-based tests and center-outward tests using spherical score functions, with local asymptotic normality established under general densities and elliptical special cases. By selecting appropriate scores, the proposed tests achieve parametric efficiency while remaining valid for all absolutely continuous error densities, and they display superior performance in nonelliptical settings through simulations and a real-data example. The results yield a practical, robust, and efficient toolbox for multivariate inference in regression and MANOVA without strong distributional assumptions.

Abstract

Extending rank-based inference to a multivariate setting such as multiple-output regression or MANOVA with unspecified d-dimensional error density has remained an open problem for more than half a century. None of the many solutions proposed so far is enjoying the combination of distribution-freeness and efficiency that makes rank-based inference a successful tool in the univariate setting. A concept of center-outward multivariate ranks and signs based on measure transportation ideas has been introduced recently. Center-outward ranks and signs are not only distribution-free but achieve in dimension d > 1 the (essential) maximal ancillarity property of traditional univariate ranks, hence carry all the "distribution-free information" available in the sample. We derive here the Hájek representation and asymptotic normality results required in the construction of center-outward rank tests for multiple-output regression and MANOVA. When based on appropriate spherical scores, these fully distribution-free tests achieve parametric efficiency in the corresponding models.

Paper Structure

This paper contains 30 sections, 8 theorems, 45 equations, 6 figures, 2 tables.

Key Result

Proposition 2.1

Let ${\bf F} _{{ \pm}}$ denote the center-outward distribution function of ${\rm P}\in{\cal P}_d$. Then, Let ${\bf Z}^{(n)}_i,\ldots ,{\bf Z}^{(n)}_i$ be i.i.d. with distribution ${\rm P}\in{\mathcal{P}}_d$ and center-outward distribution function ${\bf F} _{{ \pm}}$. Then, Assuming, moreover, that ${\rm P}\in{\mathcal{P}}_d^{+}$,

Figures (6)

  • Figure 1: Wisconsin Diagnostic Breast Cancer (WDBC) data: bivariate scatterplots and univariate histograms for mean fractal dimension (V12), standard error of texture (V14), standard error of symmetry (V21), and standard error of fractal dimension (V22) in 212 malignant patients (triangles) and 357 benign patients (circles).
  • Figure 2: Empirical powers of two-sample location tests based on the Wilcoxon center-outward rank statistic (solid line), the Wilcoxon elliptical rank statistic (dashed line), and Hotelling's two-sample test (dotted line), as functions of the shift $\delta$ under bivariate normal and elliptical Student (1 and 3 degrees of freedom) error densities; sample sizes $n_1=n_2=50$ (red), $200$ (blue), and $450$ (black).
  • Figure 3: Empirical powers of two-sample location tests based on the Wilcoxon center-outward rank statistic (solid line), the Wilcoxon elliptical rank test statistic (dashed line), and Hotelling's two-sample test (dotted line), as functions of the shift $\delta$, for the mixtures of two normal (left panel) and two $t_1$ error densities (right panel), respectively; sample sizes $n_1=n_2=50$ (red), $200$ (blue), and $450$ (black).
  • Figure 4: Empirical powers of two-sample location tests based on the Wilcoxon center-outward rank statistic (solid line), the Wilcoxon center-outward rank statistic computed from linearly sphericized residuals (dot-dashed line), the Wilcoxon elliptical rank test statistic (dashed line), and Hotelling's two-sample test (dotted line), as functions of the shift $\delta$ for the "U-shaped" (upper left panel) and the "S-shaped" (upper right panel) mixtures of three normal error densities, and skew-$t$ error densities with $\nu=1.1$ (bottom left panel) and $\nu=3$ (bottom right panel) degrees of freedom, respectively; sample sizes $n_1=n_2=50$ (red), $200$ (blue), and $450$ (black).
  • Figure 5: Empirical powers of MANOVA tests based on the Wilcoxon center-outward rank statistic (solid line), the Wilcoxon elliptical rank test statistic (dashed line), Pillai's test (dotted line), and Roy's test (dashed-dotted line) as functions of the shift $\delta$, for the normal distribution (left panel) and the U-shaped mixture of three normals (right panel); the sample sizes are $n_1=n_2=n_3=75$ (red) and $300$ (black).
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 2.1
  • Proposition 2.2
  • Proposition 3.1: Hájek representation
  • Proposition 3.2: Asymptotic normality
  • Proposition 4.1
  • Proposition 4.2
  • Proposition 5.1
  • Corollary 5.2