Table of Contents
Fetching ...

A generalization of a U-statistics-based MCAR Test: Utilizing Partially Observed Variables

Danijel Aleksić

TL;DR

This work generalizes a U-statistics-based MCAR test to exploit partially observed variables, extending the original $A_n$ framework by incorporating covariances between incomplete data and response indicators through new statistics $T_{n,X}^{(u,v)}$, $T_{n,Y}^{(u,v)}$, and $\hat{T}_{n,Y}^{(u,v)}$. The expanded test statistic $A_n'$ combines these components with an estimated covariance $\hat{\Lambda}$ to achieve a asymptotic $\chi^2_{pq+q(q-1)}$ distribution under MCAR, enabling detection of a broader class of alternatives. Extensive simulations show superior calibration and robustness to finite fourth moment assumptions, with improved power relative to Little's MCAR test in most practical settings and especially in scenarios where the old test misses alternatives. The approach remains scalable to higher dimensions and avoids strict limitations of prior implementations, though MNAR-alone scenarios remain challenging. Overall, the method provides a more flexible and powerful tool for MCAR assessment in datasets with partially observed variables, offering practical benefits for complete-case analyses and missing-data inference.

Abstract

In this paper, a generalized version of a U-statistics-based test for MCAR developed by Aleksić (2024) is presented. The novel test, similar to the original, tests for MCAR by calculating and combining the covariances between the response indicators and the data variables. However, unlike the old test, it is able to utilize partially observed variables, resulting in a significantly larger class of detectable alternatives. The novel test appears to be well calibrated, much better than the Little's MCAR test that was used as a benchmark. For the alternatives that were detectable for the old test, the novel test has comparable, although slightly lower, power as the old one, but is still able to outperform Little's test in all of the studied scenarios. For alternatives that were previously undetectable or barely detectable, the novel test performs the best of three. The novel test has the same assumption of finite fourth moments of the data, the same assumption necessary for Little's test. The results indicate that the novel test is more robust to this assumption, although both tests have similar limitations.

A generalization of a U-statistics-based MCAR Test: Utilizing Partially Observed Variables

TL;DR

This work generalizes a U-statistics-based MCAR test to exploit partially observed variables, extending the original framework by incorporating covariances between incomplete data and response indicators through new statistics , , and . The expanded test statistic combines these components with an estimated covariance to achieve a asymptotic distribution under MCAR, enabling detection of a broader class of alternatives. Extensive simulations show superior calibration and robustness to finite fourth moment assumptions, with improved power relative to Little's MCAR test in most practical settings and especially in scenarios where the old test misses alternatives. The approach remains scalable to higher dimensions and avoids strict limitations of prior implementations, though MNAR-alone scenarios remain challenging. Overall, the method provides a more flexible and powerful tool for MCAR assessment in datasets with partially observed variables, offering practical benefits for complete-case analyses and missing-data inference.

Abstract

In this paper, a generalized version of a U-statistics-based test for MCAR developed by Aleksić (2024) is presented. The novel test, similar to the original, tests for MCAR by calculating and combining the covariances between the response indicators and the data variables. However, unlike the old test, it is able to utilize partially observed variables, resulting in a significantly larger class of detectable alternatives. The novel test appears to be well calibrated, much better than the Little's MCAR test that was used as a benchmark. For the alternatives that were detectable for the old test, the novel test has comparable, although slightly lower, power as the old one, but is still able to outperform Little's test in all of the studied scenarios. For alternatives that were previously undetectable or barely detectable, the novel test performs the best of three. The novel test has the same assumption of finite fourth moments of the data, the same assumption necessary for Little's test. The results indicate that the novel test is more robust to this assumption, although both tests have similar limitations.
Paper Structure (8 sections, 16 equations, 13 figures)

This paper contains 8 sections, 16 equations, 13 figures.

Figures (13)

  • Figure 1: Empirical test sizes for 2X3Y case, standard normal distribution
  • Figure 2: Empirical test sizes for 2X3Y case, Clayton copula with parameter 1 and $\mathcal{E}(1)$ margins
  • Figure 3: Empirical test sizes for 2X3Y case, Clayton copula with parameter 1 and $\chi^2_4$ margins
  • Figure 4: Empirical test powers for 2X3Y case, standard normal distribution, MAR 1 to 9 (var. 1 controls missingness in var. 3 and var. 5, var. 2 controls var. 4)
  • Figure 5: Empirical test powers for 2X3Y case, standard normal distribution, combination of MAR rank and MCAR (var. 3 controls missingness in var. 4 and var. 5, and then MCAR missingness is generated in var. 3)
  • ...and 8 more figures

Theorems & Definitions (5)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5