Realistic Expectations for New Teacher Evaluation Systems

I'm seeing some off-the-mark responses to the news–first reported by Education Week's Stephen Sawchuck, and then picked up by the New York Times–that many of the new, high-stakes teacher evaluation systems are rating only 2-6 percent of teachers ineffective. This is being greeted by some supporters of numbers-driven teacher reform as a disappointment, while skeptics, like American Federation of Teachers president Randi Weingarten, are suggesting this proves the vast majority of teachers are great performers, after all. 

I don't think we can jump to either conclusion. First of all, the goal of these systems is not necessarily to fire large numbers of teachers, it is to help them improve their practice, since previously, most American educators received little constructive feedback on their work. Most new evaluation plans include more classroom observations, which means teachers are not just receiving number ratings, but actual notes and suggestions on their instruction. Of course, whether that feedback is helpful or useless depends entirely on the quality of the administrator.

Contra Nicholas Beaudrot, it's not true that education reformers have a "hazy" idea of how many bad teachers they'd like to see lose their jobs after this overhaul. I've asked a number of prominent accountability hawks that question over the past six years and the answer I've heard most frequently is "5 to 10 percent." As Matthew DiCarlo explains, that estimate is culled from the research of the ubiquitous Stanford economist Eric Hanushek, and by that standard, these evaluation systems are already half way to where they are intended to be, a reasonable outcome for something so new.

That said, I'm still quite skeptical that the new evaluation plans will transfrom the teaching profession, in part because of the lessons from history I'm learning as I research my book. For over a century, school reformers have been dissatisfied with how teachers are evaluated, yet overhauling rating systems has not, historically, been an effective way to improve educational outcomes for kids. This is like hoping to lose weight by buying a new, high-tech scale, without changing your diet or exercise routines.

During the late nineteenth century, the New York City schools used an "excellent-good-fair-bad" rating system for teachers. When reformer William Maxwell became superintendent in 1898, he complained that 99.5 percent of teachers were rated "good" and instituted a plan to grade teachers on an A-D scale instead. The city distributed intricate tables for judging teachers’ output. First, teachers would be measured by evidence of their students’ learning, which could be demonstrated through test scores or examples of children’s essays, penmanship, and drawings. Teachers would also be judged on their personal characteristics and given numeric ratings in largely subjective categories, such as “obedience,” “honesty of work,” “dress,” “voice,” and “force of character.” A teacher’s command of classroom discipline could be assessed by counting the number of students who were late or unruly, and even by timing the number of seconds and minutes it took for a teacher to distribute or collect worksheets. 

By the late teens, the vast majorty of teachers were earning perennial ratings of B+, the exact sort of slightly-better-than-average rating that had predominated under the previous plan. In prominent education journals, dissident principals like Alexander Fichlander, a Brooklyn leftist, explained that the paperwork involved with implementing the system was so burdensome that administrators rushed through it; what's more, there was little incentive to spend a lot of time rating teachers if the district provided no extra funding or training to those who needed to improve. Additionally, when managers find it is difficult to replace low-performing teachers with workers who are more effective–another likely outcome–they may decide evaluation systems are not worth their time.

Because of these problems, by mid-century, detailed evaluation systems were being replaced by simpler "satisfactory-unsatisfactory" plans, which today are being replaced by value-added measurement and frequent observation notes. But if the new evaluation systems end up being more about paperwork than about improving practice, then they, too, will fail to improve instruction and will lose their political palatibility.

7 thoughts on “Realistic Expectations for New Teacher Evaluation Systems

  1. Tchmathculture

    Thanks, dana. It always helps to have the historical perspective.

    For what it’s worth, I am dubious of the professional learning opportunities afforded by the increased observations that are part of the new evaluation systems. Principals are overworked, and they often manage to generate the paperwork but can’t necessarily follow up with the one-on-one discussions that would support teachers’ learning. Add to that the complexity of instructional leadership, particularly in the secondary school (how do a few administrators have teaching expertise in all subjects and with all types of children?), and I see in many places the increased observations as mostly surveillance and compliance — increased bureaucracy with limited professional learning.

    I would love to hear from administrators who feel that the increase in observation is making a difference in instructional improvement (which, for me, ≠ test score increases).

    Again, thanks for sharing your research.

  2. Kristin Blagg

    Fascinating post, particularly for the little-known historical perspective on teacher evaluations.

    I dug into some TNTP data and behavioral research to further explore the ideas of leniency bias and hiring risk for teachers: link to

  3. Kristina Rizga

    Fascinating historic perspective. Haven’t come accross this fact before. Thank you, Dana! Good luck with your book. Can’t wait to read it.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>