At the end of the last school year, I was chatting with two excellent teachers, and our conversation turned to the new state-mandated teacher evaluation system and its use of student “growth scores” (“Student Growth Percentiles” or “SGPs” in Massachusetts) to measure a teacher’s “impact on student learning.”
“Guess we didn’t have much of an impact this year,” said one teacher.
Throughout the school, comments were similar -- indicating that a major “impact” of the new evaluation system is demoralizing and discouraging teachers. (How do I know, by the way, that these two teachers are excellent? I know because I worked with them as their principal – being in their classrooms, observing and offering feedback, talking to parents and students, and reviewing products demonstrating their students’ learning – all valuable ways of assessing a teacher’s “impact”.)The other teacher added, “It makes you feel about this high,” showing a tiny space between her thumb and forefinger.
According to the Massachusetts Department of Elementary and Secondary Education (“DESE”), the new evaluation system’s goals include promoting the “growth and development of leaders and teachers,” and recognizing “excellence in teaching and leading.” The DESE website indicates that the DESE considers a teacher’s median SGP as an appropriate measure of that teacher’s “impact on student learning”:
“ESE has confidence that SGPs are a high quality measure of student growth. While the precision of a median SGP decreases with fewer students, median SGP based on 8-19 students still provides quality information that can be included in making a determination of an educator’s impact on students.”
Given the many concerns about the use of “value-added measurement” tools (such as SGPs) in teacher evaluation, this confidence is difficult to understand, particularly as applied to real teachers in real schools. Considerable research notes the imprecision and variability of these measures as applied to the evaluation of individual teachers. On the other side, experts argue that use of an “imperfect measure” is better than past evaluation methods. Theories aside, I believe that the actual impact of this “measure” on real people in real schools is important.
As a principal, when I first heard of SGPs I was curious. I wondered whether the data would actually filter out other factors affecting student performance, such as learning disabilities, English language proficiency, or behavioral challenges, and I wondered if the data would give me additional information useful in evaluating teachers.
Unfortunately, I found that SGPs did not provide useful information about student growth or learning, and median SGPs were inconsistent and not correlated with teaching skill, at least for the teachers with whom I was working. In two consecutive years of SGP data from our Massachusetts elementary school:
Ø One 4th grade teacher had median SGPs of 37 (ELA) and 36 (math) in one year, and 61.5 and 79 the next year. The first year’s class included students with disabilities and the next year’s did not.
Ø Two 4th grade teachers who co-teach their combined classes (teaching together, all students, all subjects) had widely differing median SGPs: one teacher had SGPs of 44 (ELA) and 42 (math) in the first year and 40 and 62.5 in the second, while the other teacher had SGPs of 61 and 50 in the first year and 41 and 45 in the second.
Ø A 5th grade teacher had median SGPs of 72.5 and 64 for two math classes in the first year, and 48.5, 26, and 57 for three math classes in the following year. The second year’s classes included students with disabilities and English language learners, but the first year’s did not.
Ø Another 5th grade teacher had median SGPs of 45 and 43 for two ELA classes in the first year, and 72 and 64 in the second year. The first year’s classes included students with disabilities and students with behavioral challenges while the second year’s classes did not.
As an experienced observer/evaluator, I found that median SGPs did not correlate with teachers’ teaching skills but varied with class composition. Stronger teachers had the same range of SGPs in their classes as teachers with weaker skills, and median SGPs for a new teacher with a less challenging class were higher than median SGPs for a highly skilled veteran teacher with a class that included English language learners.
Furthermore, SGP data did not provide useful information regarding student growth. In analyzing students’ SGPs, I noticed obvious general patterns: students with disabilities had lower SGPs than students without disabilities, English language learners had lower SGPs than students fluent in English, students who had some kind of trauma that year (e.g., parents’ divorce) had lower SGPs, and students with behavioral/social issues had lower SGPs. SGPs were correlated strongly with test performance: in one year, for example, the median ELA SGP for students in the “Advanced” category was 88, compared with 51.5 for “Proficient” students, 19.5 for “Needs Improvement,” and 5 for the “Warning” category.
There were also wide swings in student SGPs, not explainable except perhaps by differences in student performance on particular test days. One student with disabilities had an SGP of 1 in the first year and 71 in the next, while another student had SGPs of 4 in ELA and 94 in math in 4th grade and SGPs of 50 in ELA and 4 in math in 5th grade, both with consistent district test scores.
So how does this “information” impact real people in a real school? As a principal, I found that it added nothing to what I already knew about the teaching and learning in my school. Using these numbers for teacher evaluation does, however, negatively impact schools: it demoralizes and discourages teachers, and it has the potential to affect class and teacher assignments.
In real schools, student and teacher assignments are not random. Students are grouped for specific purposes, and teachers are assigned classes for particular reasons. Students with disabilities and English language learners are often grouped to allow specialists, such as the speech/language teacher or the ELL teacher, to work more effectively with them. Students with behavioral issues are sometimes placed in special classes, and are often assigned to teachers who work particularly well with them. Leveled classes (AP, honors, remedial), create different student combinations, and teachers are assigned particular classes based on the administrator’s judgment of which teachers will do the best with which classes. For example, I would assign new or struggling teachers less challenging classes so I could work successfully with them on improving their skills.
In the past, when I told a teacher that he/she had a particularly challenging class, because he/she could best work with these students, he/she generally cheerfully accepted the challenge, and felt complimented on his/her skills. Now, that teacher could be concerned about the effect of that class on his/her evaluation. Teachers may be reluctant to teach lower level courses, or to work with English language learners or students with behavioral issues, and administrators may hesitate to assign the most challenging classes to the most skilled teachers.
In short, in my experience, the use of this type of “value-added” measurement provides no useful information and has a negative impact on real teachers and real administrators in real schools. If “data” is not only not useful, but actively harmful, to those who are supposedly benefitting from using it, what is the point? Why is this continuing?