Publication | Closed Access
Linear Equating for the NEAT Design: Parameter Substitution Models and Chained Linear Relationship Models
12
Citations
11
References
2009
Year
Measurement TheoryEngineeringGeneralizability TheoryModeling MethodEducationOptimal Experimental DesignPsychometricsClassical Test TheoryTest ScoresPsychologyParameter Substitution ModelsTest DerivationSystems EngineeringApplied MeasurementFactor AnalysisTestabilityStatisticsStructural Equation ModelingTest DevelopmentDesignTotal Test ScoresEducational TestingValidity TheoryEducational MeasurementLinear EquatingModel TransformationNeat DesignAnchor Test ScoresModel BuildingPsychological MeasurementModel AnalysisData Modeling
Abstract This paper analyzes five linear equating models for the nonequivalent groups with anchor test (NEAT) design with internal anchors (i.e., the anchor test is part of the full test). The analysis employs a two-dimensional framework. The first dimension contrasts two general approaches to developing the equating relationship. Under a parameter substitution (PS) approach, estimates of the means and variances for the two tests for some target population are substituted into a generic equating formula; under a chained linear relationship (CLR) approach, expressions for the anchor test scores as functions of total test scores for each of the test forms are simply “equated” to each other. In order to implement either of these approaches, some relationships must be assumed invariant across the groups. The second dimension involves three different choices for the invariant relationships, the regressions of test scores (X or Y) on anchor scores (V), the regression of anchor scores on test scores, or a basic scaling/equating relationship between anchor scores and test scores. If we adopt a scaling/equating relationship of Y with V and X with V as the invariant relationship, the resulting equating relationship is the same for the PS and CLR approaches. So, five distinct regression models yielding five different equating relationships are developed within the two-dimensional framework. The equating relationships for the Tucker, Chained Linear, and Levine Observed-score methods are derived under the PS approach. The equating relationships for the Levine True-score, Chained Linear, and a Tucker-like method (Angoff Design V) are derived under the CLR approach. Key words: Angoff modelchained linearLevine modelslinear equatingparameter substitutionregression Notes 1This scaling relationship has the same mathematical form as an equating relationship, but scaling is a more general term and calling this relationship a scaling relationship distinguishes it from, and hopefully avoids confusion with, the equating of X to Y. Throughout this paper we will refer to a relationship as scaling when it includes V and X (or Y) and as equating when discussing a relationship that includes X and Y. 2In deriving the equating relationship for the Levine methods, we make no assumptions about true scores, and therefore our derivations are quite different from those proposed by CitationLevine (1955) and those proposed by subsequent authors (CitationKolen & Brennan, 1987, 2004). This is especially true for the Levine True-score method. As discussed more fully by Mroch et al. (this issue), the possibility of deriving the Levine True-score method using only observed score assumptions (i.e., that the OLS regression of V on X and V on Y are invariant across groups) is based on the fact that OLS regression of one variable on another provides good estimates of the true-score relationship between the variables when the independent variable is essentially error free, and most of the random variation is in the dependent variable. Because the anchor test is typically much shorter than the full-length test forms, this condition holds, at least approximately, when V is regressed on X or Y. 3The labels associated with the different methods in Table 1 ignore some variability in the specific models associated with a particular label (e.g., “Tucker”, “Levine”). For example, we derive the PS methods assuming a specific synthetic population (w1 = 1, w2 = 0); in practice, different synthetic populations (with different values for w1 and w2) may be assumed, leading to slightly different equating relationships. In practice, several of these methods may be used and the resulting equating lines may be averaged in some ways. As discussed later in this paper and in Suh et al. (this issue), the choice of values for synthetic population weights does not generally make any substantial difference. On a more fundamental level, most derivations of the “Levine” methods employ assumptions about “true scores”, while our derivations employ only observed-score statistics and make no assumptions about true scores. However, the assumptions in our derivations for the “Levine” models are closely related to those made by Levine, and they yield the same equating relationships as the Levine methods, so we refer to them as “Levine” methods. 4For the value of ζ2 to be defined for the Levine methods, it is necessary to assume that the correlation is not equal to zero. Given that the anchor test is designed to be a shorter version of the full-length form, and is in fact a part of the two test forms, the correlation between X and V is likely to be fairly high. So this is a relatively safe assumption. 5An anonymous reviewer suggested that all equating models should reduce to the mean/sigma model in EquationEq (2), if we have identical groups, with equal means and variances on the anchor test. Since this is not the case for the CLR model with V regressed on X and Y, in EquationEq (32), or for the CLR model with X and Y regressed on V, in EquationEq (37), unless ρ1(X, V) = ρ2(Y, V), the reviewer would not consider these two methods to be equating methods. This objection raises a number of interesting issues. First, given that the two CLR methods under consideration yield the equating relationships for the Levine True-score method and Angoff design V B, it would seem to suggest that these methods should not be considered equating methods. Second, the CLR methods are inherently symmetric, which is not necessarily true of the PS methods. And third, as indicated in Suh et al., the Levine True-score method functioned very well, particularly if the groups differed substantially in ability. Nevertheless, the term “equating” is generally reserved for cases in which the two tests to be equated have been built to have the same content and statistical properties and to measure the same construct. If X and Y were found to have substantially different correlations with V in a given population, it would probably be more appropriate to refer to the relationships in EquationEqs (32) and Equation(37) as linking functions than as equating functions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1