At the end
of the last school year, I was chatting with two excellent teachers, and our
conversation turned to the new state-mandated teacher evaluation system and its
use of student “growth scores” (“Student Growth Percentiles” or “SGPs” in
Massachusetts) to measure a
teacher’s “impact on student learning.”
“Guess we
didn’t have much of an impact this year,” said one teacher.
The other
teacher added, “It makes you feel about this high,” showing a tiny space
between her thumb and forefinger.
Throughout
the school, comments were similar -- indicating that a major “impact” of the
new evaluation system is demoralizing and discouraging teachers. (How do I
know, by the way, that these two teachers are excellent?
I know because I worked with them as
their principal – being in their classrooms, observing and offering feedback, talking
to parents and students, and reviewing products demonstrating their students’
learning – all valuable ways of assessing a teacher’s “impact”.)
According to
the Massachusetts Department of Elementary and Secondary Education (“DESE”),
the new evaluation system’s goals include promoting the “growth and development
of leaders and teachers,” and recognizing “excellence in teaching and leading.”
The DESE website indicates that the DESE considers a teacher’s median SGP as an
appropriate measure of that teacher’s “impact on student learning”:
“ESE has confidence that SGPs are
a high quality measure of student growth. While the precision of a median SGP
decreases with fewer students, median SGP based on 8-19 students still provides
quality information that can be included in making a determination of an educator’s
impact on students.”
Given the many
concerns about the use of “value-added measurement” tools (such as SGPs) in
teacher evaluation, this confidence is difficult to understand, particularly as
applied to real teachers in real schools.
Considerable research notes the imprecision and variability of these
measures as applied to the evaluation of individual teachers. On the other side, experts argue that
use of an “imperfect measure” is better than past evaluation methods. Theories aside, I believe that the
actual impact of this “measure” on real people in real schools is important.
As a
principal, when I first heard of SGPs I was curious. I wondered whether the data would actually filter out other
factors affecting student performance, such as learning disabilities, English
language proficiency, or behavioral challenges, and I wondered if the data
would give me additional information useful in evaluating teachers.
Unfortunately, I found that SGPs did not
provide useful information about student growth or learning, and median SGPs
were inconsistent and not correlated with teaching skill, at least for the
teachers with whom I was working. In two consecutive years of SGP data from
our Massachusetts elementary school:
Ø One
4th grade teacher had median SGPs of 37 (ELA) and 36 (math) in one
year, and 61.5 and 79 the next year.
The first year’s class included students with disabilities and the next
year’s did not.
Ø Two
4th grade teachers who co-teach their combined classes (teaching
together, all students, all subjects) had widely differing median SGPs: one
teacher had SGPs of 44 (ELA) and 42 (math) in the first year and 40 and 62.5 in
the second, while the other teacher had SGPs of 61 and 50 in the first year and
41 and 45 in the second.
Ø A
5th grade teacher had median SGPs of 72.5 and 64 for two math classes
in the first year, and 48.5, 26, and 57 for three math classes in the following
year. The second year’s classes included
students with disabilities and English language learners, but the first year’s
did not.
Ø Another
5th grade teacher had median SGPs of 45 and 43 for two ELA classes
in the first year, and 72 and 64 in the second year. The first year’s classes
included students with disabilities and students with behavioral challenges
while the second year’s classes did not.
As an experienced observer/evaluator, I
found that median SGPs did not correlate with teachers’ teaching skills but
varied with class composition.
Stronger teachers had the same range of SGPs in their classes as
teachers with weaker skills, and median SGPs for a new teacher with a less
challenging class were higher than median SGPs for a highly skilled veteran
teacher with a class that included English language learners.
Furthermore,
SGP data did not provide useful information regarding student growth. In
analyzing students’ SGPs, I noticed obvious general patterns: students with
disabilities had lower SGPs than students without disabilities, English
language learners had lower SGPs than students fluent in English, students who
had some kind of trauma that year (e.g., parents’ divorce) had lower SGPs, and
students with behavioral/social issues had lower SGPs. SGPs were correlated strongly with test
performance: in one year, for example, the median ELA SGP for students in the
“Advanced” category was 88, compared with 51.5 for “Proficient” students, 19.5
for “Needs Improvement,” and 5 for the “Warning” category.
There were also
wide swings in student SGPs, not explainable except perhaps by differences in student
performance on particular test days.
One student with disabilities had an SGP of 1 in the first year and 71
in the next, while another student had SGPs of 4 in ELA and 94 in math in 4th
grade and SGPs of 50 in ELA and 4 in math in 5th grade, both with
consistent district test scores.
So how does
this “information” impact real people in a real school? As a principal, I found that it added
nothing to what I already knew about the teaching and learning in my
school. Using these numbers for
teacher evaluation does, however, negatively impact schools: it demoralizes and
discourages teachers, and it has the potential to affect class and teacher
assignments.
In real
schools, student and teacher assignments are not random. Students are grouped for specific
purposes, and teachers are assigned classes for particular reasons. Students
with disabilities and English language learners are often grouped to allow
specialists, such as the speech/language teacher or the ELL teacher, to work
more effectively with them.
Students with behavioral issues are sometimes placed in special classes,
and are often assigned to teachers who work particularly well with them. Leveled classes (AP, honors, remedial),
create different student combinations, and teachers are assigned particular
classes based on the administrator’s judgment of which teachers will do the
best with which classes. For example, I would assign new or struggling teachers
less challenging classes so I could work successfully with them on improving
their skills.
In the past,
when I told a teacher that he/she had a particularly challenging class, because
he/she could best work with these students, he/she generally cheerfully accepted
the challenge, and felt complimented on his/her skills. Now, that teacher could be concerned about
the effect of that class on his/her evaluation. Teachers may be reluctant to teach lower level courses, or
to work with English language learners or students with behavioral issues, and
administrators may hesitate to assign the most challenging classes to the most
skilled teachers.
In short, in
my experience, the use of this type of “value-added” measurement provides no
useful information and has a negative impact on real teachers and real
administrators in real schools. If “data” is not only not useful, but actively
harmful, to those who are supposedly benefitting from using it, what is the
point? Why is this continuing?