Reflections on the First 2 Years of Milestone Implementation

Abstract

The Accreditation Council for Graduate Medical Education (ACGME) and the American Board of Medical Specialties (ABMS) collectively constitute the foundation of professional self-regulation in the United States. In February 1999, the 2 organizations approved 6 general competencies broadly relevant for all medical practice, followed by the official launch of the Outcomes Project in 2001.1 It was expected that the competencies would be an antidote to overspecification of accreditation standards, and that they would empower programs to create training programs grounded in meaningful outcomes in a developmental approach.1As many programs can attest, the implementation of outcomes-based (eg, competency-based) medical education has been challenging. One reason has been the difficulty in implementing the competencies in both curriculum and assessment. Program leaders lacked shared mental models within their own training programs, accompanied by a lack of shared understanding nationally within disciplines. It is important to remember that 1 of the thorny problems the milestones were intended to address was the sources of unwanted and unwarranted variability in educational and, by extension, clinical outcomes. In addition, the community cannot improve at scale what cannot be measured, and prior frames and approaches to measurement were insufficient and ineffective. A key goal for milestones thus is to help improve the state and quality of measurement through better assessment in graduate medical education to facilitate the improved outcomes everyone desires.Approximately 10 years ago, conversations began on how to more effectively and meaningfully operationalize the competencies to help improve the design of residency and fellowship programs through the use of a developmental framework. In parallel, the ACGME began to explore mechanisms to move the accreditation system to a focus on outcomes using a continuous quality improvement philosophy.2 Developmental milestones, using narratives to describe in more descriptive terms the professional trajectories of residents, were seen as a way to move the outcomes project forward.3,4 Starting in 2007, the disciplines of internal medicine, pediatrics, and surgery began to create developmental milestones for the 6 competencies.4–6Surgery would subsequently delay the development of their milestones focusing first on the SCORE curriculum.7 The ACGME began to restructure its accreditation processes in 2009, and soon after, milestone groups were constituted for all specialties. Milestone writing groups were cosponsored by the ACGME and the ABMS member certification boards.4 Early groups had significant latitude in developing their subcompetencies and milestones; specialties that started the process after 2010 used a standard template. Each milestone set was subjected to review by the educational community in the specialty. Box 1 provides an overview of the purposes of the milestones across key stakeholders, and figure 1 provides an example of a key driver diagram of milestones as an educational and clinical intervention. As figure 1 highlights, milestones can potentially trigger a number of drivers, or mechanisms, to help enable changes in residency and fellowship education.In 2013, the milestones were officially launched in 7 core specialties (emergency medicine, internal medicine, neurological surgery, orthopaedic surgery, pediatrics, diagnostic radiology, and urology) as a formative, continuous quality improvement component of the new accreditation system.4 The remaining core disciplines and the majority of subspecialties implemented the milestones starting in July 2014. We have now reached an important "milestone" in the implementation process, and our commentary provides a high-level overview of the first 2 years of the milestone experience, including information from the 2 most recent reporting cycles, and a description of what is next in the evaluation of the milestone initiative.Figure 2 provides an overview of how the milestones inform the graduate medical education system. At the program level, individual residents and fellows are assessed routinely through a combination of assessment tools, including direct observations; global evaluation; audits and review of clinical performance data; multisource feedback from team members, including peers, nurses, patients, and family; simulation; in-service training examination (ITE); self-assessment; and others. Assessment tools should be selected intentionally to allow routine, frequent, formative feedback to the resident or fellow to affirm areas of successful performance and to highlight competencies they need to improve.8 The clinical competency committee (CCC) should help to analyze and synthesize the assessment data, such as "quantitative" information from in-service examinations and clinical performance audits, as well as "qualitative" information from faculty, peers, and other raters through surveys and direct observation. Using the milestones, the CCC should reach a consensus judgment regarding each resident's or fellow's performance.9 The CCC provides those conclusions to the program director, who has the ultimate authority to determine residents' or fellows' milestone developmental level at least twice yearly. Milestones are used as a guiding framework and "blueprint" for individual learner performance and, aggregated to the program level, to assess the effectiveness of the curriculum and learning experiences.9For the ACGME, the unit of analysis is the program, and this process uses the national data as a mechanism to help improve training overall. Collectively, the goal of this system is to help the entire medical education enterprise be accountable to the public for honest assessments of resident and fellow performance, and for truthful verification of their readiness to progress to unsupervised practice. As shown in figure 2, while the ACGME is involved with the certification boards around research on the effectiveness of the milestones, milestone data are not used to determine eligibility for certification by the boards.Participation in milestone-based assessment and reporting obviously is critical to the long-term success of the milestone component of the NAS. Without robust reporting, meaningful feedback to the specialties and evaluation research is not possible, and lack of participation might send a negative signal to policy makers and the public about viability in graduate medical education self-regulation. The good news is that reporting has been very robust, with data capture across the 4 milestone cycles to date reaching 99% to 100%. For the 2014–2015 academic year, 7498 programs reported on 117 548 residents and fellows at midyear (99.9%) and 7628 programs reported on 118 360 residents and fellows at end-of-year reporting (99.9%). Between the 2 reporting periods, data were lacking for just 31 residents and fellows. For the first time, the US graduate medical education system has formative national data to guide assessment and curricular innovation and change, and as noted below, this is already happening in some specialties.While it is too early to perform a systematic review, several studies on the early experiences with milestones are worth noting as they provide a lens into needed ongoing evaluation research. One of the first national studies to find evidence of validity involved the first-year experience with the Emergency Medicine (EM) Milestones. This study examined reliability and milestone judgment distributions by training year across all emergency residency programs.10 An earlier mixed methods study involving program directors from 17 internal medicine programs found the milestones to be useful for formative assessment, but faculty development was recognized as an important need to operationalize the milestones.11 On the other hand, a group of internal medicine programs found only modest differences in perceived quality of feedback by residents after implementation of the milestone system.12Regarding single institution studies, 1 program found the implementation of the first set of Internal Medicine Milestones improved faculty evaluations and feedback.13 Another study in a large internal medicine program found that transitioning to a milestone-based model produced a larger separation in the scores between postgraduate years (PGY) 1 to 3 and a wider use of a 5-point scale on an end-of-rotation evaluation form.14 Two studies determined that use of milestones was more effective than use of previous evaluation forms, and found better discrimination in ratings and a reduction in common rater errors.15,16 On the other hand, a study of a milestone "passport" intervention in an emergency medicine program found only modest increases in resident satisfaction with feedback.17 Another study reported that milestone-based assessments for end-of-shift evaluations led to grade inflation in an emergency medicine program.18 Using information technology is an additional growing theme of milestone research. For example, a surgery program is using a smartphone application to complete a Zwisch scale immediately after a procedure and linked this to milestones.19 The Foundation for Exxcellence in Women's Health Care has also built mobile assessment tools for milestones, and the work is ongoing.20 Collectively, these studies provide "early signals" and highlight the critical importance of ongoing, iterative, and rigorous research of the milestone initiative. We are truly only at the very beginning.Now that the majority of specialties have completed their first year of implementation, the vital work of evaluating the milestones is picking up momentum. The milestones are not without their critics and concerns, and this early presumptive feedback will be important in framing the evaluation activities of the milestones.21–23 Evaluating the milestones will be a complex enterprise because in many respects the milestones represent a complex intervention. While there are a number of definitions of what constitutes a complex intervention, the milestones and the NAS meet a number of criteria for complexity:24,25Any evaluation strategy will have to attend to these aspects of milestones and will require a mixed methods, comprehensive approach. From the quantitative (or psychometric) perspective, the use of a validity framework is crucial. Looking through the lens of the Messick validity framework as an example, there is a set of key issues for making inferences about the validity of the milestones (box 2).26These are the areas in the validity domain that are beginning to inform the research agenda, as exemplified by the national EM Milestones study. However, a purely psychometric point of view will be insufficient in evaluating and understanding the milestones. Furthermore, milestones are not "static"; as programs continue to work with them their understanding of them will change and, in turn, evaluative judgments and curriculum will also change and evolve. The ACGME signaled from the beginning that the current milestones are "version 1.0," and with learning will come the need for revisions down the road.27 Therefore, evaluation of the milestones will also utilize lessons and guidance from the program evaluation field on evaluating complex interventions.25 Instead of just looking at milestones through an attribution lens, examining how milestones contribute to an outcome will be crucial.28Too often we treat educational interventions and innovations as "therapeutic interventions" (eg, pills) that if taken properly and in the appropriate dose will produce a desired outcome. This biomedical model, traditionally focused on attribution (cause and effect question: Did the milestones cause the resident or fellow to be better in X competency[ies]?), has dominated research discourse in medical education for decades. However, milestones represent an education intervention within a program (ie, embedded within the residency or fellowship curriculum) consisting of multiple interacting and interdependent components. Treating a complex programmatic intervention, such as a residency program and milestones, as a medical procedure or pill will be insufficient to address the complexity of this intervention.25,28,29Newer programmatic evaluation models have increasingly moved away from purely cause-effect, linear models that heretofore were mostly concerned with making sure the inputs of an intervention were clear, standardized, and randomly assigned to subjects, and the outcomes were clearly defined, rigorously measurable, and meaningful. Much of the implementation activity in-between was essentially a "black box," managed through randomization to ensure a determination of a mean effect. This model, however, has many shortcomings in evaluating complex interventions. First, the interactions, interdependencies, context, and quality of the implementation can and do have large effects on outcomes. Failure to understand these aspects of the intervention can lead to misguided conclusions about effect and generalizability. While it is beyond the scope of this article to cover all the program evaluation strategies available to assess complex interventions, a few concepts warrant mention.Investigators studying the milestones should ask the fundamental questions: What works for whom, in what circumstances, and why? These questions form the core of realistic program evaluation strategy by Pawson and Tilley and other program evaluation strategies that emphasize the need to look deep into the "black box" of implementation to understand the mechanisms of a specific intervention and how context affects the success or failure of the intervention.25,30 The concept of "partial solutions" as a major aspect of program interventions is also important. As Pawson points out, no intervention, however complex and comprehensive, is ever a complete solution to a problem or need.Some components will work better than others, and the key issue is to determine why so as to learn how to improve the next iteration of the intervention.25 Without this understanding of what works, for whom, and in what circumstances, it will be very hard to generalize lessons from milestone and graduate medical education research in a single site or a small group of programs to a national cohort. Furthermore, "failures" can be rich sources of learning that can be fed forward into the iterative cycle of milestone and residency program improvement and development.The second concept moves programmatic evaluation away from a sole focus on attribution (just cause and effect) to one of contribution. The fundamental question in a contribution analysis is "How much of a difference (or contribution) has the program made to the observed outcomes?"28 Central to all complex program evaluation strategies is developing a theory of change of how each component of the intervention contributes to the outcome, including interactions with the other components. The key driver diagram (figure 1) provides some, but likely not all, of the possible ways milestones can effect change in programs. In this case, a theory of change really describes the hypothesized pathway to the desired outcome. The goal is to create a robust and credible contribution story. Building on the realist questions, for example, the story should describe how the interventions did or did not trigger the intended mechanisms, how well the interventions were implemented and functioned in specific contexts, and how, using the best evidence available, the intervention contributed to the outcomes measured. By now you have likely realized no single research method will likely be sufficient. Mixed method qualitative and quantitative methods will be needed. The specific methods will depend on the questions and outcomes.The ACGME milestones are intended to describe the educational and professional trajectory of a resident or fellow from the beginning of their education and training through the achievement of competency and the ability to enter into the unsupervised practice of medicine. The milestones are also designed to help address the thorny problem of better addressing and identifying the sources of unwanted and unwarranted variability in educational, and by extension, clinical outcomes. Furthermore, prior frames and approaches to measurement have been ineffective and insufficient.A key goal is to improve the quality of curricula and assessment to facilitate the improved outcomes everyone in the graduate medical education system desires. This year will mark the third year of milestone reporting for the first 7 core specialties and the beginning of reporting for the remaining subspecialties. Looking at the first 2 years of implementation, ongoing research, and new research being proposed, much more will be learned about the milestones, including how they should be used in programs, their effect on residents and fellows, and how they will improve graduate medical education.

References

Page 1

	Year	Citations

Page 1