The British Experience with "Authentic" Testing.

Abstract

Observations on the use of standard assessment tasks in Britain have important lessons for Americans who might be considering the use of similar assessment techniques in a national testing system, Messrs. Madaus and Kellaghan suggest. It is now widely accepted that when multiple-choice, norm-referenced, and criterion-referenced standardized tests were used as administrative devices to implement education policies during the last two decades, they adversely affected the system. Because of the high stakes associated with their use, these tests drove instruction and narrowed the curriculum. Further, the tests measured low-level knowledge and skills. Teaching to such tests eventually corrupted their validity as indicators of achievement, and test results gave the nation an incorrect picture of achievement levels and progress, as documented in the now infamous Lake Wobegon study.(1) One reaction against the negative influences of traditional testing has been to call for the deployment of alternative or authentic testing, which will involve the direct assessment of complex performances. This kind of testing is increasingly seen as consisting of the three P's - performance, portfolios, and products. The use of such techniques, the argument goes, would have a positive influence on the education system in general and on instruction and learning in particular. Teachers would have clear models of acceptable outcomes, higher-order skills would be stressed, and students' thinking processes would be examined.(2) Further, the assessment techniques would better measure the kind of learning described in contemporary cognitive psychology, a process whereby students take in information, interpret it, connect it to what they already know, and, if necessary, reorganize their mental structures to accommodate new understandings.(3) Such arguments have broad intuitive appeal. Nonetheless, we must evaluate them in the context of two realities. First, there are many practical, technical, and infrastructural issues that must be resolved before such techniques can safely be deployed as policy instruments on a large scale in schools. Second, claims about the positive effects of such techniques have to be examined in light of how the results will be used. If high stakes are not associated with test use - and they were not in the traditional use of standardized tests(4) - then the tests are not likely to have much impact on instruction. The measurement-driven instruction of the 1970s and 1980s, on the other hand, was a direct outgrowth of the perception that test results were being used to make high-stakes decisions about individuals and institutions. The context of use is critical in evaluating the potential impact of any assessment technique, authentic or not. At present, many uses are being proposed for a national testing system in the United States, some of which involve very high stakes. These proposed functions include improving instruction and learning, monitoring progress toward the national goals, holding institutions and individuals accountable, certifying the successful completion of a given level of education, assisting in decisions about college admission or entry-level employment, and motivating students by having real rewards for success and real consequences for failure. How are the three P's likely to work when used in the high-stakes contexts of accountability and certification? Can they be made immune to the corruption historically associated with teaching to the test and measurement-driven instruction? How will issues related to these techniques - such as technical adequacy, efficiency, manageability, standardization, comparability, and costs - be influenced by the context in which they are used? The purpose of this article is to shed some light on these questions by reviewing recent experiences in Britain with new assessment techniques. The techniques that have been tried there are much broader in scope than traditional written tests and are sometimes suggested in the current American debate as alternatives to standardized tests. …