07110 303 - PDF Free Download

MEMORANDUM

505 14th Street, Suite 800 Oakland, CA 94612-1475 Telephone (510) 830-3700 Fax (510) 830-3701 www.mathematica-mpr.com

TO:

First 5 LA and LAUP Child Progress Team Members

FROM:

Yange Xue, Sally Atkins-Burnett, and Emily Moiduddin

SUBJECT:

Developing Child Progress Targets for the PerformanceBased Contract – Revised Memo

04/12/2011 UPCOS4-81

DATE:

As part of Phase 4 of the Universal Preschool Child Outcomes Study (UPCOS-4), Mathematica is conducting direct child assessments in a representative sample of children in Los Angeles Universal Preschool (LAUP) programs to inform the performance-based contract between First 5 LA and LAUP. In this memo, we summarize work done in consultation with First 5 LA and LAUP to develop the programs’ child progress targets to be included in First 5 LA’s performance-based contract with LAUP. Targets are based on information regarding child progress collected during prior phases of UPCOS. We begin this memo with an overview of the developmental domains and associated child outcome measures on which the targets are focused. We then describe the decision-making process by which the targets were set and the issues considered in setting the targets. Finally, we present the agreed-upon program child progress targets and the rationale for each. DOMAINS OF CHILD DEVELOPMENT TO BE MONITORED To identify targets for the performance based contract, we began by working with First 5 LA and LAUP to identify the domains of development and associated outcome measures on which to focus. In August 2010, Mathematica submitted a memo outlining our recommendations; these recommendations were accepted by the full First 5 LA and LAUP team. Although it is impossible for a brief assessment to encompass all the elements of school readiness, the assessment tools should collectively tap the important domains of development, including the five domains identified by the National Education Goals Panel (Kagan et al. 1995), which are similar to those in the California Preschool Learning Foundations (California Department of Education 2008): language and literacy, mathematics, social-emotional development, approaches to learning, and fine motor development. The selected measures were all used in prior phases of UPCOS and together address multiple school readiness dimensions. Table 1 summarizes the child outcome measures agreed upon by First 5 LA and LAUP and notes the domains of development that each addresses.1 1

In addition to the measures shown in Table 1, children completed the English and Spanish Preschool Language Assessment Survey (Pre-LAS) as both a language screener and warm-up exercise. The team determined that these measures should not be used for setting targets.

An Affirmative Action/Equal Opportunity Employer

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 2

Table 1. Recommended Child Outcomes Measures and Developmental Domains Measure Purpose Language Development: Expressive One-Word Picture Vocabulary Test – Spanish Bilingual Edition (EOWPVT-SBE) Vocabulary – Conceptually Scored (Brownell 2000) Rapid Letter Naming (RLN)

Literacy Development – Conceptually Scored

Woodcock-Johnson III (WJ-III) and WoodcockMuñoz Batería III (WM-III) Applied Problems (Mather and Woodcock 2001; Woodcok and Muñoz-Sandoval 2005)

Cognitive Development: Mathematics

Woodcock-Johnson III—Test 7 Test Spelling and Woodcock-Muñoz III—Test 7 Ortografía (Mather and Woodcock 2001; Woodcok and Muñoz-Sandoval 2005)

Fine Motor Skills and Literacy Development: Ability to copy shapes, write letters and words/Writing and spelling

Leiter-R Examiner Rating Scale (Leiter-R) (Roid and Miller 2006)

Social-Emotional Development and Approaches to Learning

PROCEDURE AND DATA USED IN DEVELOPING THE TARGETS The targets are based on information drawn from prior phases of UPCOS on how children in LAUP progress across a preschool year. Targets were developed using data from Phase 3 of UPCOS for all measures except for Rapid Letter Naming (RLN). For RLN, we used UPCOS-2 data. A detailed description of UPCOS-3 and the results of the analyses are presented in Moidudidn et al. (2010) and Xue et al. (2010). For detail regarding UPCOS-2, see Love et al. (2009). The specific targets for child progress in each of the developmental domains were set through a collaborative process with LAUP and First 5 LA that Mathematica staff facilitated. The team charged with setting the targets2 met five times during December 2010 and January 2011. Together, the team examined the distribution of the scores in prior rounds of UPCOS for each outcome measure and used that information as a foundation for setting targets. In addition to examining the distribution for the overall samples from prior rounds, the team also examined scores by child language group to ensure that meeting the targets would indicate that children from different language groups all made progress. The team also considered the types of support that teachers have been receiving from LAUP (for example, specific topics at teacher institutes or practices to which coaches give emphasis during ongoing interactions with teachers).

2

Members of the team were Sharon Murphy from First 5 LA; Kimberly Hall, Julia Love, Schellee Rocher, Delila Vasquez, and Marlene Alfaro from LAUP; and Sally Atkins-Burnett and Yange Xue from Mathematica.

2

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 3 Although LAUP is constantly devising new ways to support teachers and improve practice, there was no expectation that support received up to this point would result in child progress that differed markedly from that seen in prior phases of UPCOS. Thus, the group reached agreement that this year’s targets should specify a level of growth similar to that documented in prior phases of UPCOS. Future targets may aim for additional growth based on UPCOS-4 data. ISSUES CONSIDERED IN SETTING THE CHILD PROGRESS TARGETS Appropriate Metrics/Type of Scores for the Targets Each of the assessments offers multiple types of scores: raw scores, standard scores, or IRT/W3 scores. Children’s growth over the year could be monitored with any of the available scores. The group agreed that IRT/W scores would be most appropriate for reporting change. Raw scores simply summarize how many items a child answers correctly and incorrectly without accounting for factors such as the difficulty of the items. Standard scores adjust for age and compare local scores to a nationally representative sample of same-age peers. Two children might have the same raw score, but if one of them is a little older and falls into a different age grouping, the standard score for the slightly older child would be lower than for the younger child. However, standard scores are not equal-interval scores; change at one point in the scale is not necessarily equal to change at another point in the scale. In contrast to a standard score, the IRT or W score is an equal-interval score (that is, a change at one point on the scale is equal to change at another point on the scale) that adjusts for item difficulty. With this type of score, children who correctly respond to more difficult items receive credit for knowing more challenging words or information. Because the scale is equal interval, though children will be showing change at different points on the scale, we can compare how far they are moving along the scale. We can also look at change across age groups using IRT or W scores because the scores do not adjust for child age. However, it is important to note that the IRT/W scales differ across measures and are therefore not comparable. That is, a change of 3 IRT points on RLN is not equivalent to a 3-point change in the W score on the WJ-III tests or a 3-point change on the IRT score for the EOWPVT. Each scale is different, and the meaning of the change is specific to the assessment and the items included in that assessment. After considering the advantages and disadvantages of each type of score, the group decided to use IRT scores to set the targets. However, standard scores will also be provided so that interested stakeholders can understand what LAUP children’s scores mean relative to the mean

3

IRT scores are estimated using item response theory rather than classical test theory. In the case of the measures used in UPCOS, the IRT scores all used a one-parameter Rasch model that estimates item difficulty and child proficiency. It is also referred to as W-score or growth score depending on the measure.

3

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 4 skill levels of children nationally. Note, however, that standard scores and IRT/W scores are not directly comparable. Specified Change Versus Mean Change The group came to an agreement that it would be more meaningful to set the targets based on the percentage of children who make a specified change rather than to base them on mean change scores from fall to spring. A mean score can increase when a small group of children make a lot of change even when there are many children who make no change. Since we are interested in seeing LAUP benefit all children, the targets will be based on percentages. Children who enter with strong skills may not make as much progress as children who enter with weaker skills, but we are interested in seeing that they are making some progress. For each of the measures, we will select targets of two levels of difficulty. At the first level, we will expect that programs will successfully 70 percent of children in reaching the target. At the second, more difficult level, we will expect the program will successfully support 45 percent of children in reaching the target. LAUP expects to meet the targets at both levels of difficulty. Among the measures, we make an exception to this approach for the Leiter-R Examiner Rating Scale, which is used to examine social-emotional development and approaches to learning. The Leiter-R scale scores are truncated at 10 (that is, children’s growth cannot be detected beyond a certain level). Therefore, the target will address whether children’s behaviors fall within the expected range4 in the spring. Overall Versus Subgroup Targets In the process of setting the targets, the group examined the distributions (histograms, frequencies) of the change scores in all developmental domains in the overall sample and language subgroups. For measures on which the distributions of change scores did not drastically differ across the language groups, the team agreed to set an overall target; these measures include the EOWPVT/EOWPVT-SBE, RLN, WJ-III Spelling, and WJ-III/WM-III Applied Problems. Note that for some of these measures, the distributions of the change scores differed in the other language group.5 However, the sample size for the other language group is small (around 20) and its inclusion in an overall target is unlikely to bias the results.

4

The expected range includes children who are behaving within a typically developing range (a scale score greater than 6) and thus are not identified as being at risk for social-emotional problems. LAUP raised a caution about the term that would be used for these targets, so we use “in the acceptable range” or “in the expected range” and avoid using any clinical terms. 5

The other language group includes children whose primary language is a language other than English or Spanish.

4

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 5 The distributions of the Leiter-R scores are different across language groups. Thus, the team agreed to set separate targets for the English only and English primarily groups versus ELL groups. These ratings are made by assessors who are not familiar with the children and the items may have different meanings for ELLs. In future years, the team would like to consider presenting the results separately for the sample of children who do not have identified special needs. However, information on special needs was not collected this year. As an alternative, Mathematica can identify those children who score more than 2 standard deviations below the mean in the fall and exclude them from the analysis. This information would be for the purpose of informing LAUP rather than for determining whether the targets are met. Measure in the UPCOS-4 Battery to Exclude from the Targets UPCOS-3 data show that about 70 percent of children who completed the WM-III Ortografía (that is, they were assessed in Spanish) did not show any growth from fall to spring. Statistical tests indicated that the fall standard score (91) and spring standard score (89) did not differ. These children were likely learning the letters in English in the program but were assessed in Spanish. Thus, the prompt in the Spanish test is not consistent with the letters they learned. Because LAUP programs do not typically teach children Spanish letters, we would not expect to see growth in this area. Therefore, the group agreed not to set targets based on the WM-III Ortografía. FINAL TARGETS FOR CHILD PROGRESS At the end of the process, LAUP and First 5 LA agreed on targets in each of six domains of child development: language, literacy, cognitive/math, social-emotional development, approaches to learning, and fine motor. In Box 1 we summarize the rationale behind decisions described above that affect all of the targets. Table 2 summarizes the final targets for monitoring child progress in each of the developmental areas and rationale related to that specific target. In the remainder of this memo, we discuss the targets shown in Table 2 in detail. Language: Targets Based on the EOWPVT/EOWPVT-SBE The EOWPVT-SBE is a conceptually scored measure of expressive vocabulary; that is, children are given credit for correct responses in either Spanish or English. We decided to set the following two targets for EOWPVT/EOWPVT-SBE based on the program’s outcomes for children in UPCOS-3: • •

70 percent of children gain 2 points or more on the EOWPVT IRT score from fall to spring. 45 percent of children gain 5 points or more on the EOWPVT IRT score from fall to spring.

5

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 6 Almost all LAUP programs include ELL children. Because ELL children are learning two languages, we might expect more limited growth in each of those languages relative to a child learning only one language. Thus, the first target specifies growth of 2 points or more for 70 percent of all children. The second target specifies a change of 5 points or more. A change of 5 points indicates a substantive change in vocabulary. For example, based on UPCOS-3 data, children who were at the mean score in the fall and gained 5 more points in the spring would move from being able to name a variety of objects encountered frequently at home and in the classroom (for example, wagon) to being able to correctly identify objects that children experience only infrequently or only through literature (for example, suitcase/luggage). Programs should be able to change children’s vocabulary at both levels of difficulty regardless of the overall curriculum in the classroom. Teachers play an important role in how children learn and use language, and teacher training is valuable in terms of improving children’s language outcomes. While some curricula provide more support for language development than others, all teachers should be reading to children and supporting language development through strategies that include teaching and using more sophisticated vocabulary. Although programs may not show large change on standard scores for high-performing children (those scoring more than 1 standard deviation above the mean in the fall) due to regression to the mean, regression to the mean is not as pronounced with the IRT scores and so it is less of a concern. Because the IRT scores capture children’s absolute growth rather than their growth in comparison to the nationally representative sample of same-age peers, change scores on the EOWPVT reflect increases (or decreases) in the child’s ability to appropriately name more-challenging pictures. Small or negative change scores would suggest that teachers are not expanding children’s vocabulary beyond what they already know. When presenting the results from this year’s data, we could provide more information about the children who did not meet the targets to investigate whether the children who are not making change are those who already had strong skills, or are those who are ELLs and may be learning the words that they already know but in a different language. Literacy: Targets Based on Rapid Letter Naming (RLN) The RLN task was developed for UPCOS-2 and is a conceptually scored measure of letter knowledge. There are two forms with lower and upper case letters on each form. Children received one form in the fall and the other in the spring. When the IRT scores are calculated, the two different forms will be put on the same scale for the purpose of determining whether the targets have been met. On average, upper case letters are easier for children to name than lower case letters. Unlike the other measures included in the targets, we used the data from UPCOS-2 (rather than UPCOS-3) to guide creation of the RLN targets for two reasons. First, in UPCOS-3 teachers administered the RLN, and there were some difficulties with administration. Second, UPCOS-2, like the current phase of UPCOS, included a representative sample of children in LAUP. In UPCOS-3, children whose parents were the first to opt for participation were included in the 6

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 7 sample; thus, they were not representative of LAUP as a whole. It should be noted that UPCOS-2 did not include family child care centers (FCCs); however, the results from UPCOS-3 indicate that there is no difference in RLN raw scores between centers and FCCs. Therefore, targets should not be biased if set using UPCOS-2 RLN data. For the RLN task, the two targets are: • •

70 percent of children gain 7 points or more on the RLN IRT score from fall to spring. 45 percent of children gain 13 points or more on the RLN IRT score from fall to spring. Box 1. Overarching Rationale for Child Progress Targets

Rationale for measures in assessment battery. Although it is impossible for a brief assessment to encompass all the elements of school readiness, the assessment tools should collectively tap the important domains of development, including the five domains identified by the National Education Goals Panel (Kagan et al. 1995), which are similar to those in the California Preschool Learning Foundations (California Department of Education 2008). •

Cognition and general knowledge (for example, literacy, mathematics, problem solving)

•

Language development

•

Approaches to learning

•

Social and emotional development

•

Physical well-being and motor development

Rationale for type of change specified. The final targets are based on the percentage of children who make a specified change rather than on mean change scores from fall to spring. A mean score can increase when a small group of children make a lot of change even when there are many children who make no change. Since we are interested in all children benefiting from LAUP, the targets are based on percentages of children who make a specified change. Rationale for two levels of difficulty. Children who enter with strong skills may not make as much progress as children who enter with weaker skills, but we are interested in seeing that they are making some progress. For each of the measures, we selected targets at two levels of difficulty. At the first level, we will expect that 70 percent of children reach the target. At the second, more difficult level, we will expect that 45 percent of children reach the target. LAUP expects to meet the targets at both levels of difficulty. Rationale for type of score. Each of the assessments used offer multiple types of scores. For most of the measures, the group agreed to use equal interval scores (that is, a change at one point on the scale is equal to change at another point on the scale) that adjust for item difficulty but do not adjust for age. These scores are referred to as IRT scores or W scores, depending on the measure. With this type of score, children who correctly respond to more difficult items receive credit for knowing more challenging words or information. Because the scale is equal interval, though children will be making change at different points on the scale, we can still compare how far they are moving along the scale. We can also look at change across age groups using IRT scores because the scores do not adjust for child age. Rationale for magnitude of target. Targets are based on the distribution of children’s scores from Phase 3 of the Universal Preschool Child Outcomes Study (UPCOS-3).

7

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 8 Table 2. Final Targets for Child Progress from Fall to Spring, by Domain and Measure Domain Language Development

Measure EOWPVT/ EOWPVT-SBE

• •

Literacy Development

Rapid Letter Naming

• •

Fine

WJ-III Spelling

•

Motor/Literacy Cognitive Development

•

WJ-III/WM-III Applied Problems

• •

Social-Emotional Development and Approaches to Learning

Leiter-R Examiner Rating Scales Attention Activity Sociability

•

•

Targets 70% of children gain 2 points or more on the EOWPVT IRT score 45% of children gain 5 points or more on the EOWPVT IRT score

Measure-Specific Rationale Almost all LAUP programs include ELL children. Because ELL children are learning two languages, we might expect more limited growth in each of those languages relative to a child learning only one language. Thus, the first target specifies growth of 2 points or more for 70 percent of all children. The second target specifies a change of 5 points or more. A change of 5 points indicates a substantive change in vocabulary.

70% children gain 7 points or more on the RLN IRT score 45% children gain 13 points or more on the RLN IRT score 70% gain 7 points or more on the WJ-III Spelling W score 45% gain 17 points or more on the WJ-III Spelling W score 70% gain 4 points or more on the WJ-III/WM-III Applied Problems W score 45% gain 13 points or more on the WJ-III/WM-III Applied Problems W score

The group agreed these targets represent meaningful change over the preschool year.

85% score in the expected range in the spring for English only and English primarily groups 75% score in the expected range in the spring for Spanish only, Spanish primarily, and other language only and primarily groups

8

The group agreed these targets represent meaningful change over the preschool year.

In examining the distributions of the change scores for the WJ-III/WM-III Applied Problems in UPCOS-3 data, we found that if we use the predetermined 70/45 percent criteria, the cut-points would be different for tests in different languages; children who were assessed in Spanish (WM-III) made greater gains. Additional investigation indicated that the tests may not function the same way across language groups. Ultimately the group agreed that the targets on the WM-III Applied Problems should be set to parallel those on the WJ-III Applied Problems. Thus, the final targets for both the WJ and WM reflect change seen in UPCOS-3 on the WJ-III Applied Problems. We used a different approach to set the targets for the Leiter-R than other measures because scores are truncated (that is, children cannot score above a certain level). Thus, targets refer to the percentage of children who score in the expected range in the spring, rather than on growth from fall to spring. Prior studies show that English-speaking children are more adapted to the classroom culture than children speaking other languages. Therefore, the group decided to set separate targets for different groups because of cultural and language differences as well as observed differences in the distributions of the scores.

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 9 Fine Motor/Literacy: Targets Based on the WJ-III Spelling The Spelling subtest of the WJ-III addresses literacy development and fine motor skills (children are required to copy shapes and letters and to write letters and later words). The two targets for WJ-III Spelling are: • •

70 percent of children gain 7 points or more on the WJ-III Spelling W score from fall to spring. 45 percent of children gain 17 points or more on the WJ-III Spelling W score from fall to spring.

For the English primarily group, slightly less than 45 percent of children gained 17 points or more during UPCOS-3; this is likely due to the fact that they entered their programs with relatively high scores. However, children in UPCOS-4 are likely to exhibit a greater range of skills (with many entering with a lower skill level) because they are a representative sample; the nonrepresentative sample from UPCOS-3 likely entered with a higher average skill level. Cognitive Development: Targets Based on the WJ-III/WM-III Applied Problems The WJ/WM Applied Problems subtest addresses mathematics concepts; its standardized rules allow children to answer correctly in English or Spanish. In examining the distributions of the change scores for the WJ-III/WM-III Applied Problems in UPCOS-3 data, we found that if we use the predetermined 70/45 percent criteria, the cutpoints would be different for tests in different languages. On the WJ-III Applied Problems, 70 percent of children gained 4 points or more and 45 percent gained 13 points or more. On the WM-III Applied Problems, 70 percent of children gained 5 points or more and 45 percent gained 18 points or more. To further inform the decision, the group examined change scores for the group of children who took the WM-III Applied Problems in the fall but switched to the WJ-III Applied Problems in the spring. For these 93 children, their standard scores on Applied Problems increased on average from 86 in the fall to 97 in the spring, and 70 percent gained 12 points or more from fall to spring. A change of this magnitude across languages is surprising. This suggests that the tests do not function the same across the groups. Despite the fact that the user’s manual indicates that the W scores on the WM-III and WJ-III are supposed to be equivalent—anchored on item difficulties for items similar on both forms—this does not appear to be the case in the UPCOS-3 sample. It is unclear if the jump in scores was so high because children did not know enough Spanish to perform well on the WM-III in the fall (despite being routed to that test) or due to the standardization sample. The WM-III equating study used a sample that includes children from Mexico City with higher SES than typical Spanish-speaking children in the United States. Ultimately the group agreed that the targets on the WM-III Applied Problems should be set to parallel those on the WJ-III Applied Problems. With this approach, those taking the WM-III Applied Problems may be more likely to meet the targets than those taking the WJ-III; in a sense 9

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 10 we may be underestimating the amount of change for children who take the WM-III. This approach also ensures that children who switched from the WM-III to the WJ-III would be likely to meet the targets. Thus, the final targets for both the WJ and WM reflect change seen in UPCOS-3 on the WJ-III Applied Problems: • •

70 percent of children gain 4 points or more on the WJ-III/WM-III Applied Problems W score. 45 percent of children gain 13 points or more on the WJ-III/WM-III Applied Problems W score.

Social-Emotional Development and Approaches to Learning: Targets Based on the LeiterR Examiner Rating Scales In UPCOS-3 and 4 we used three subscales of the Leiter-R to assess children’s socialemotional skills and approaches to learning: Attention, Activity Level, and Sociability. With the Leiter-R, assessors rate, by responding to a series of questions, how well the child behaves during the one-to-one interaction required in the direct assessment situation. As noted earlier, we used a different approach to set the targets for the Leiter-R than other measures because scores are truncated; we examined the percentage of children who scored in the expected range in the spring of UPCOS-3 (that is, the targets are based solely on spring scores). Although the targets for the Leiter-R are based solely on children’s scores in the spring, we do expect that children’s Leiter-R scores will improve over the course of the preschool year. Preschools socialize children to a school environment that, to a certain extent, parallels the assessment situation; children learn to attend to the adult and respond to questions. Prior studies show that English-speaking children are more adapted to the classroom culture than children speaking other languages. Therefore, the group decided to set separate targets for different groups because of cultural and language differences as well as observed differences in the distributions of the scores: • •

85 percent of children score in the expected range in the spring for English only and English primarily groups 75 percent of children score in the expected range in the spring for Spanish only, Spanish primarily, and other language only and primarily groups

For the representative sample in UPCOS-2, the percentages of children scoring in the acceptable range in the spring are similar to those from UPCOS-3, although the fall percentages were lower in UPCOS-2 than in UPCOS-3. This suggests that the targets based on UPCOS-3 are attainable.

10

MEMO TO: First 5 LA and LAUP Child Progress Team Members FROM: Sally Atkins-Burnett, Yange Xue, and Emily Moiduddin DATE: 4/12/2011 PAGE: 11 REFERENCES Brownell, R. “Expressive One-Word Picture Vocabulary Tests.” San Antonio, TX: Harcourt Assessment, Inc., 2000. California Department of Education. “California Preschool Learning Foundations.” Sacramento, CA: Author, 2008. Kagan, S. L., Moore, E., & Bredekamp, S. (Eds.). “Reconsidering children's early development and learning: Toward shared beliefs and vocabulary.” Washington, DC: National Education Goals Panel, 1995. Love, John M., S. Atkins-Burnett, C. Vogel., N. Aikens, Y. Xue, M. Mabutas, B.L. Carlson, E.S. Martin, N. Paxton, M. Caspe, S. Sprachman, and K. Sonnenfeld. “Los Angeles Universal Preschool Programs, Children Served, and Children’s Progress in the Preschool Year: Final Report of the First 5 LA Universal Preschool Child Outcomes Study.” Report submitted to First 5 LA. Princeton, NJ: Mathematica Policy Research, June 2009. Mather, Nancy, and Richard W. Woodcock. “Woodcock-Johnson III Tests of Achievement Examiner’s Manual.” Itasca, IL: Riverside Publishing, 2001. Moiduddin, Emily, Sally Atkins-Burnett, Yange Xue, Pia Caronongan, Elisha Smith, and Marta Induni .“Results of Activities Informing the Performance-Based Contract Between First 5 LA and LAUP.” Final Report submitted to First 5 LA. Washington, DC: Mathematica Policy Research, June 30, 2010. Roid, Gale H., and Lucy J. Miller. “Leiter-R Performance Scale—Revised.” Wood Dale, IL: Stoelting Co., 1997. Woodcock, Richard W., and F. Munoz-Sandoval. “Woodcock-Muñoz Language Survey Revised.” Itasca, IL: Riverside Publishing, 2005. Xue, Yange, Sally Atkins-Burnett, Pia Caronongan, and Emily Moiduddin. “Informing the Performance-Based Contract Between First 5 LA and LAUP: Assessing Child Progress.” Report submitted to First 5 LA. Washington, DC: Mathematica Policy Research, December10, 2010.

11