Estimating Latent Structure Models with Categorical Variables: One-Step Versus Three-Step Estimators

TLDR

Latent structure models for categorical data comprise a measurement part (latent class model) and a structural part (system of logit equations), and multiple imputation can account for the randomness of predicted latent variables to yield standard errors for structural parameters. This study examines the properties of a three‑step estimator for such models and proposes a simple correction to address a common bias. The three‑step approach first estimates a stand‑alone measurement model, then computes predicted latent scores from its parameters and observed indicator patterns, and finally treats those scores as observed variables in the structural part. Using the naive three‑step method systematically underestimates the strength of associations in the structural part, but the proposed correction eliminates this bias, as shown in both simulated and real data.

Abstract

We study the properties of a three-step approach to estimating the parameters of a latent structure model for categorical data and propose a simple correction for a common source of bias. Such models have a measurement part (essentially the latent class model) and a structural (causal) part (essentially a system of logit equations). In the three-step approach, a stand-alone measurement model is first defined and its parameters are estimated. Individual predicted scores on the latent variables are then computed from the parameter estimates of the measurement model and the individual observed scoring patterns on the indicators. Finally, these predicted scores are used in the causal part and treated as observed variables. We show that such a naive use of predicted latent scores cannot be recommended since it leads to a systematic underestimation of the strength of the association among the variables in the structural part of the models. However, a simple correction procedure can eliminate this systematic bias. This approach is illustrated on simulated and real data. A method that uses multiple imputation to account for the fact that the predicted latent variables are random variables can produce standard errors for the parameters in the structural part of the model.

References

Page 1

	Year	Citations

Page 1