Understanding Developmental-Behavioral Screening Measures

Abstract

With nearly half of pediatricians still not using standardized screening tools, primary care clinicians should know that the periodic use of general developmental and social-emotional screening tools has been proven to identify promptly two to six times more children (age 0–5 years) with suspected delays than a clinician’s unstructured surveillance alone.After completing this article, readers should be able to:Developmental and behavioral problems are among the most common conditions of childhood and adolescence; 15% of children have a developmental disability and 21% have a mental health disorder. If intervention is instituted before kindergarten entrance, many problems can be prevented and the large majority can be ameliorated. In the United States, early detection depends heavily on primary care providers. Primary care clinicians should know that the periodic use of general developmental (1)(2)(3)(4) and social-emotional (5) screening tools has been proven to identify promptly two to six times more children (age 0–5 years) who have suspected delays than a clinician’s surveillance alone. (1)(2)(3)(4)(5) This difference in outcome includes foster care populations, in whom pediatricians possess knowledge of children being at higher risk for a developmental delay or social-emotional disorder. (4)(5)Periodic screening leads to substantial increases in early intervention (EI) and early childhood special education (ECSE) eligibility rates. (1)(2) Interestingly, the percentage of pediatricians “who self-reported always/almost always using ≥1 screening tools” has increased significantly from 23.0% to 47.7% from 2002 to 2009. (6) With greater numbers of clinicians incorporating standardized screening tools into their practices, a better understanding of how to optimize their performance is needed. With nearly half of pediatricians still not routinely using standardized tools, further education is warranted about which tools are feasible for use in primary care, with attention paid to which tools are psychometrically sound. Education also is warranted on how practitioners can better employ screens in a 1) safe, 2) equitable, 3) effective, 4) timely, 5) patient- and parent-centered, and 6) efficient manner. These are the Institute of Medicine's six quality aims.Surveillance is the process in which clinicians watch for signs of developmental or behavioral problems in the course of caring for children. Screening refers to the use of brief standardized tools to differentiate those who need a further evaluation for potential problems from those who probably do not. Although screening measures help to refine the pattern of delays in children who have detectable problems, they are meant primarily for children perceived to be asymptomatic by the clinician and for those in whom surveillance demonstrates risk.Developmental-behavioral screens are different from many other medical screens, which seek to identify a strictly positive or negative condition (eg, cystic fibrosis, congenital hypothyroidism, phenylketonuria). In fact, research indicates that children who have false-positive screens (screening test positive/concerning but subsequently ineligible for EI/ECSE services) are a high-risk group in need of diligent monitoring and other community resources. (2)(7)There are numerous categories of developmental-behavioral screening instruments (Table 1) used to identify a wide spectrum of developmental-behavioral conditions with variable prevalence rates at different ages (Table 2). Parent-report screens (Figs 1–3) typically are designed to measure one of the first three components of developmental-behavioral surveillance: 1) eliciting and addressing parents’ concerns; 2) maintaining a developmental-behavioral history (reviewing milestones and behaviors); and 3) identifying developmental-behavioral risk and protective factors (biological and environmental).Practitioner-administered screens (Fig 4), which directly elicit or make observations about the child’s skills and behavior, generally measure another component of surveillance, making accurate and informed observations about the child and parent–child interactions. Screens may be broad-band (general) (Figs 1 and 4), meaning they tap all or most developmental-behavioral domains (expressive speech and language, receptive language, gross motor, fine motor, cognitive/problem-solving, self-help and adaptive skills, social-emotional/behavior) or narrow-band (Fig 2), meaning domain-, disorder-, or disability-specific. Narrow-band screens may partially tap one or more domains but are not designed to assess the wide spectrum of development-behavioral conditions at various ages (Table 2) and, therefore, ideally should always be used in conjunction with a broad-band screen.The American Academy of Pediatrics (AAP) currently recommends universal postpartum mood disorder screening in the first year after birth; general developmental screening at 9, 18, and 30 (or 24) months; autism-specific screening at 18 and 24 months; social-emotional screening whenever a general or autism-specific instrument is abnormal; “kindergarten readiness” screening at 4 years; social-emotional/mental health/psychosocial function screening at every health supervision visit from ages 5 to 18 years; and substance abuse–specific screening at every health supervision visit throughout adolescence.In addition, according to the AAP, an appropriate screen should be administered whenever a clinician’s surveillance determines “risk.” All of these screening recommendations have heightened the need for clinicians to implement office-based and community-based systems that are simultaneously effective and efficient, especially for those for whom research suggests interventions are most effective: the 25.7 million children in the United States who are age <5 years (projected population in 2012 per http://www.childstats.gov).Figure 5 provides a collaborative model for screening and surveillance in a medical home setting and is a combination of the 2006 developmental, 2007 autism spectrum disorders (ASDs), and 2010 mental health AAP algorithms. (8) When a psychometrically sound screen raises concerns, the algorithm emphasizes system-wide referral care coordination and a secondary screening or assessment, which, in a primary care setting, is best accomplished with the help of an EI/ECSE agency and an early return (≤1 month) office visit.In this algorithm, pediatric clinicians are encouraged to refer automatically when their clinical impression confidently detects a developmental delay or if the child possesses a biological or environmental condition associated with a sufficiently high probability of a delay; developmental-behavioral promotion has been formally incorporated as a component of surveillance to make the process safer, more effective and parent-centered; plus greater detail has been provided about the decisions and action steps that should reliably occur after a screening test is administered (ie, postscreening surveillance). (8)“Structured surveillance” means that the process and decision-making are enhanced with periodic screening by using evidence-based measures. (8) “Unstructured surveillance” means that the process and decision-making are reliant on subjective impressions or casual observations. (8) In a systematic review, a pediatric clinician’s unstructured surveillance, when compared with a validated screening tool or a diagnostic interview, has good specificity (the proportion of children correctly identified as not having a developmental-behavioral problem) ranging from 69% to 100%, but poor sensitivity (the proportion of children correctly identified as having a developmental-behavioral problem) ranging from 14% to 54%. (9)When pediatric health-care providers detect a problem, they usually are correct; however, they struggle to identify the majority of delays, most likely because they over-rely on psychometrically unsound milestone checklists with vague referral criteria. (1)(2)(8) As a result, the large majority of children who have evolving developmental-behavioral problems can be deprived of the benefits of EI, ECSE, or other evidence-based community services before kindergarten entrance.Periodic screening (ie, structured surveillance) has been proven repeatedly to enhance a clinician’s ability to detect, refer, and monitor children who have evolving developmental-behavioral problems. (1)(2)(3)(4)(5)(8)Although standardized screening improves the quality of care at health supervision visits, increases parental satisfaction, leads to substantial cost savings for society, and positively affects the lives of children and families via the benefits of EI and ECSE, clinicians should be aware that screening has its limitations. No single instrument will suffice to identify every developmental-behavioral problem and match the needs of every population or practice setting. The population screened and the method of implementation affect the psychometric properties of a screening measure. Screens can be falsely negative (ie, they can yield negative or borderline results, but the child could be promptly deemed eligible for EI/ECSE services) or falsely positive (ie, the screening test is positive or concerning, but the child is subsequently deemed EI/ECSE ineligible). However, a clinician’s longitudinal surveillance can be used to better interpret the typical or borderline results of a screening test.For example, when a pediatrician confidently suspects a delay and, simultaneously, a screen such as the Ages and Stages Questionnaire (ASQ) is found to be typical or borderline, 92% of pediatrician-referred children (blind to ASQ results) promptly receive some form of EI services (40% are found to be EI-eligible; 52% are placed on an EI-monitoring list). (1) However, there is no evidence to support the concept that unstructured surveillance should be used to override the concerning results of a psychometrically sound screen and negate the need for a community-based referral. (8) Children who fail screening tests and subsequently are not promptly found to have a developmental-behavioral diagnosis (false-positives) nonetheless tend to have numerous psychosocial risk factors and exhibit performance well below average in the better predictors of school success: pre-academics, language, and intelligence. (7) Indeed, children found to have false-positive screening results are a high-risk group in need of closer monitoring, repeat screenings at subsequent visits, and other community resources (eg, evidence-based parenting programs, nurse home visit programs, high-quality child care, structured preschools such as Head Start).Although experts agree that no single screening instrument will suffice to identify every developmental-behavioral problem in every practice setting, it is also true that all screening tools are not constructed equal. In 2008, Drotar et al (10) provided guidelines to help pediatricians select which screening tools were best-suited to their practices based on their population’s risk level. For general developmental parent-report tools, Drotar et al recommended the ASQ and Parents’ Evaluation of Developmental Status (PEDS) for general primary care populations and the ASQ specifically for high-risk populations. For ASD-specific screening tools, the Modified Checklist for Autism in Toddlers (M-CHAT) was recommended for general primary care populations. For further details and recommendations, refer to http://www.cmwf.org/Content/Publications/Fund-Manuals/2008/Feb/Pediatric-Developmental-Screening--Understanding-and-Selecting-Screening-Instruments.aspx.Since 2008, new tools have been developed (eg, Parents’ Evaluation of Developmental Status–Developmental Milestones [PEDS:DM], Quantitative Checklist for Autism in Toddlers), and others have undergone revisions (eg, Ages & Stages Questionnaire, Third Edition [ASQ-3]). In 2009, an AAP periodic survey, in concert with the screening tool selection trends in national and statewide implementation initiatives, collectively suggested that the ASQ, PEDS, and M-CHAT have emerged as the three most commonly selected screening instruments for children age 0 to 5 years in US primary care settings. (6)For broad-band developmental screening tools, from 2002 to 2009, pediatricians’ use of the ASQ increased from 7.3% to 22.4% and use of the PEDS increased from 2.4% to 15.9%. (6) In 2011, a well-designed study indicated that the ASQ (second edition) was significantly more accurate than the PEDS, especially for children over age 30 months, in the identification of developmental delays in a primary care setting. (11) The strengths of the ASQ include its accuracy (both overall and across all age ranges) and its ability to be used as a developmental promotion tool. When caregivers mark “not yet” to an item because they have never tried that particular developmental task with their child, this is considered a “teachable moment,” where clinicians can promote a new developmental activity.Although multiple primary care studies suggest that the paper-based ASQ is feasible, it does require getting all items on the correct age-interval ASQ thoughtfully completed by caregivers (and then correctly scored) before the health supervision visit. In contrast, the strength of the PEDS is its feasibility as a previsit screen with its 2- to 5-minute administration time, fourth- to fifth-grade reading level, and single questionnaire format for all ages.Interestingly, the PEDS has been designed to detect a broader array of “at-risk” (not just “delayed”) children and can be used not only as a screening and surveillance measure but also as a scaffold to enhance parent–provider communication. Compared with the ASQ, a higher percentage of children who have concerning or positive screens on the PEDS, which casts a “broad net,” will be found ineligible for EI/ECSE services and a higher percentage of children who have delays will be missed because of a negative screening result (although this finding varies by age).Of note, the accuracy and feasibility of the ASQ and PEDS theoretically improve with online modalities that automatically correct for prematurity, foster a more thoughtful at-home/previsit implementation approach, and score results without interference from human error. Also, with its online modality, the PEDS can be used in combination with the PEDS:DM.For autism-specific screening tools, research now indicates that the M-CHAT follow-up interview should be administered routinely for any positive or borderline M-CHAT result due to a high false-positive rate (∼90% are not subsequently diagnosed as having ASD) when using the M-CHAT at 18 and 24 months in a general, primary care population. (8)The strength of the M-CHAT is its feasibility and that it is far less likely to miss a case of autism, compared with a clinician’s unstructured surveillance. (8) Any positive M-CHAT result should lead to an EI/ECSE agency referral and, ideally, an early return office visit for further assessment. Unfortunately, a positive M-CHAT does not necessarily justify an expensive, comprehensive ASD-specific evaluation and EI plan in all cases. Children who screen positive on the M-CHAT plus Follow-up Interview should be referred for an ASD-specific EI plan and comprehensive evaluation. Although some children will not be diagnosed as having ASD, ultimately over-referrals are not a major concern because those not diagnosed as having ASD generally have other developmental disorders, predominantly language impairment or cognitive disabilities.Of note, unstructured surveillance (eg, caregiver concerns about ASD; worrisome social-communication deficits or behavioral observations; or having a sibling who has ASD) still should be used in combination with the interpretation of a screening tool (eg, a number of “critical” M-CHAT items failed) to determine the need for an ASD-specific comprehensive evaluation and EI plan without need for the follow-up interview.To develop a deeper understanding about the strengths and weaknesses of various screening tools, an understanding is needed about the basic precepts behind psychometric and feasibility standards. Core standards affect a screening tool’s application to a primary care setting (eg, performance on a wide variety of populations, screening tool completion rates, sustainability over time). The question arises, what is a core standard? Throughout this article, a core standard is defined as an inflexible property that is critically important for the effective and feasible performance of a screening measure on a primary care population but is not necessarily dependent on the individual needs of a medical home.Core standards provide a frame of reference to assure that relevant technical information is provided, with the hope that practitioners thereby will make wiser, more informed decisions when selecting, implementing, and interpreting screening tools. To learn more about what technical properties make a screening tool psychometrically sound, go to What Properties Make a Screening Tool Psychometrically Sound?, and to learn what properties make a screening tool feasible, go to What Properties Make a Screening Tool Feasible?.The AAP strongly endorses the use of properly standardized, reliable (≥80%), well-validated, and accurate (sensitivity and specificity ≥70%) screening instruments for the early identification of developmental-behavioral problems. However, highly accurate practitioner-administered instruments often possess undesirable feasibility characteristics for busy primary care settings. Practitioners should look hard at how practical any given instrument they are considering will be in the office setting. Suboptimal feasibility lowers screening tool completion rates. Lower completion rates logically hinder early detection rates because early detection is more apt to rely on subjective, unstructured surveillance instead of psychometrically sound screens.For the early detection of a broad range of developmental delays before kindergarten entrance, parent-report tools such as the ASQ-3, PEDS, and the PEDS:DM are good examples of effective and feasible screening tools that are well-suited to a wide array of primary care settings; however, the use of yes/no checklists such as the Denver Prescreening Developmental Questionnaire II, is not recommended. For the early detection of a broad range of social-emotional/mental health problems, the Ages & Stages Questionnaire: Social-Emotional for children age 3 months to 5 years and the Pediatric Symptom Checklist (PSC) for children age 6 to 18 years are good examples of effective and feasible tools.For autism-specific screening, the M-CHAT is currently a commonly selected tool. However, whenever the M-CHAT is positive (failed), it is important to administer its follow-up interview to filter out which children are truly in need of an ASD-specific comprehensive evaluation and EI plan.When selecting instruments, be aware that medical home differences often exist in patient sociodemographic characteristics, patient access, patient flow and volume, office staff resources, state EI/ECSE eligibility criteria, and the availability of early childhood community resources. For all these reasons, different practices may require a different combination of tools to optimize their early detection and referral rates. Table 3 provides a list of commonly used tools, along with their abbreviations and Web links.Although all screens have their individual strengths and weaknesses, Figs 1–4 (which are by no means an exhaustive list of measures) allow practitioners to compare some of the more commonly selected measures objectively. Ask yourself: 1) Which measures meet the majority of technical standards (white boxes) and best fit the needs of my practice? 2) Which measures do not meet the majority of technical standards (dark gray boxes)? 3) Which measures variably or questionably meet many of these technical standards (light gray boxes)? The authors of this article do not have any conflicts of interest and have extrapolated data directly from user manuals, official Web sites, and published, peer-reviewed articles.Of note, reported psychometric properties typically were determined in a combination of at-home, day care, preschool, and, less frequently, primary care settings by using paper-based and, more recently, online screening modalities. Some forms of bias are present with varying degrees of severity within the validation studies for all of these measures, but noteworthy issues are highlighted for each instrument. Under feasibility, costs have been calculated uniformly (the same for each tool) by using a paper-and-pen, modifiable model from Dobrez et al. (13) A cost of $60 per hour was used to account for the combination of practitioner and clinic staff costs will based on implementation staff the percentage of caregivers with problems, and the percentage of positive or concerning screens in To the for this article and to of the visit the at and on the Screening