A Test of Missing Completely at Random for Multivariate Data with Missing Values

TLDR

In multivariate data with missing values, determining whether missingness is completely at random (MCAR) is crucial, yet conventional mean‑comparison tests generate many correlated statistics that raise multiple‑comparison problems. This article proposes a single global test statistic that uses all available data to assess MCAR. The authors derive an asymptotic null distribution for the statistic, provide a small‑sample null for multivariate normal data with a monotone missing pattern, and show it simplifies to a standard t‑test when only one variable has missingness. Simulation results indicate the test is conservative for small samples when applied to both normal and nonnormal data.

Abstract

Abstract A common concern when faced with multivariate data with missing values is whether the missing data are missing completely at random (MCAR); that is, whether missingness depends on the variables in the data set. One way of assessing this is to compare the means of recorded values of each variable between groups defined by whether other variables in the data set are missing or not. Although informative, this procedure yields potentially many correlated statistics for testing MCAR, resulting in multiple-comparison problems. This article proposes a single global test statistic for MCAR that uses all of the available data. The asymptotic null distribution is given, and the small-sample null distribution is derived for multivariate normal data with a monotone pattern of missing data. The test reduces to a standard t test when the data are bivariate with missing data confined to a single variable. A limited simulation study of empirical sizes for the test applied to normal and nonnormal data suggests that the test is conservative for small samples.