Publication | Closed Access
Bounding the Effects of Proxy Variables on Regression Coefficients
48
Citations
3
References
1986
Year
EngineeringRegression AnalysisCausal InferenceSimultaneous Equation ModelingProxy VariablesEconomic AnalysisEstimation TheoryProxy RegressionStatisticsEconomicsEstimation StatisticEconometric MethodFinanceTrue RegressionEconometric ModelMacroeconomicsBusinessEconometricsStatistical Inference
We consider a regression in which one of the observed variables is a proxy for some unobserved variable. Given a lower bound for the correlation between the proxy and the unobserved true variable for which it substitutes, we derive intervals in which the coefficients of the unobserved true regression must lie, regardless of any other correlations involving unobserved variables or disturbances. We present a simple solution for the important special case in which only the signs of the coefficients are of concern and one seeks the smallest correlation between the proxy and the true variable that guarantees the correctness of the signs of the coefficients in the proxy regression. We also present an algorithm for extending these results to the multiple-proxy problem. ATTEMPrS TO CONFRONT economic theories with data are often hampered by the problem that some of the relevant variables are unavailable or even unobservable. In the case of regression models it is customary to substitute proxies for the unavailable variables. Though the coefficients in the proxy regression will not coincide with those in the regression of the dependent variable on the explanatory variables, it is clear by continuity that the differences will be small if each proxy is sufficiently highly correlated with the unobserved true variable for which it substitutes, even if its error is correlated with other errors or variables. Moreover, in many instances the theory predicts not the magnitudes but only the signs of the regression coefficients. Correct inferences will then be drawn from the model provided that the pairwise correlations between the proxies and the true variables are high enough to guarantee that the signs of the regression coefficients would be unchanged if the proxies were replaced by the true variables. In applications econometricians can generally assess those pairwise correlations subjectively, but how high do they need to be? In this paper we present a complete answer to this question when only one variable is a proxy, and describe an algorithm for determining the required correlations when there are several proxies. The effects of proxy variables have been extensively studied in the special case of errors-in-variables (EIV), in which the differences between the proxies and the corresponding true variables (the errors) are independent of the true variables, of each other, and of the disturbances in the true regression. We require no such independence. In the EIV case it is sometimes possible to guarantee correctness of the signs of the coefficients even without prior information about the variances of the measurement errors. For example, if the coefficient vector from the direct regression and the coefficient vectors from the reverse regressions (as one varies the left-side variable) all lie in the same orthant, then the coefficient vector in the true regression must also lie in that orthant (see
| Year | Citations | |
|---|---|---|
Page 1
Page 1