Estimation of Regression Coefficients When Some Regressors are not Always Observed

TLDR

In applied problems, models for the conditional mean of a response given regressors are common, but some regressors may be missing for certain subjects by design or happenstance. The article proposes a new class of semiparametric estimators, based on inverse probability weighting, that consistently estimate the conditional mean parameters when regressors are missing at random and the missingness probabilities are known or parametrically modeled. The authors develop inverse probability weighted estimating equations, derive the efficient score and semiparametric variance bound, and propose locally and globally adaptive efficient estimators that attain this bound, comparing them to existing methods. The study demonstrates that all previously proposed estimators are asymptotically equivalent to some (typically inefficient) member of the new class, and simulation results provide practical recommendations for choosing among them.

Abstract

Abstract In applied problems it is common to specify a model for the conditional mean of a response given a set of regressors. A subset of the regressors may be missing for some study subjects either by design or happenstance. In this article we propose a new class of semiparametric estimators, based on inverse probability weighted estimating equations, that are consistent for parameter vector α0 of the conditional mean model when the data are missing at random in the sense of Rubin and the missingness probabilities are either known or can be parametrically modeled. We show that the asymptotic variance of the optimal estimator in our class attains the semiparametric variance bound for the model by first showing that our estimation problem is a special case of the general problem of parameter estimation in an arbitrary semiparametric model in which the data are missing at random and the probability of observing complete data is bounded away from 0, and then deriving a representation for the efficient score, the semiparametric variance bound, and the influence function of any regular, asymptotically linear estimator in this more general estimation problem. Because the optimal estimator depends on the unknown probability law generating the data, we propose locally and globally adaptive semiparametric efficient estimators. We compare estimators in our class with previously proposed estimators. We show that each previous estimator is asymptotically equivalent to some, usually inefficient, estimator in our class. This equivalence is a consequence of a proposition stating that every regular asymptotic linear estimator of α0 is asymptotically equivalent to some estimator in our class. We compare various estimators in a small simulation study and offer some practical recommendations.

References

Page 1

	Year	Citations

Page 1