I'm seeing some off-the-mark responses to the news–first reported by Education Week's Stephen Sawchuck, and then picked up by the New York Times–that many of the new, high-stakes teacher evaluation systems are rating only 2-6 percent of teachers ineffective. This is being greeted by some supporters of numbers-driven teacher reform as a disappointment, while skeptics, like American Federation of Teachers president Randi Weingarten, are suggesting this proves the vast majority of teachers are great performers, after all.
I don't think we can jump to either conclusion. First of all, the goal of these systems is not necessarily to fire large numbers of teachers, it is to help them improve their practice, since previously, most American educators received little constructive feedback on their work. Most new evaluation plans include more classroom observations, which means teachers are not just receiving number ratings, but actual notes and suggestions on their instruction. Of course, whether that feedback is helpful or useless depends entirely on the quality of the administrator.
Contra Nicholas Beaudrot, it's not true that education reformers have a "hazy" idea of how many bad teachers they'd like to see lose their jobs after this overhaul. I've asked a number of prominent accountability hawks that question over the past six years and the answer I've heard most frequently is "5 to 10 percent." As Matthew DiCarlo explains, that estimate is culled from the research of the ubiquitous Stanford economist Eric Hanushek, and by that standard, these evaluation systems are already half way to where they are intended to be, a reasonable outcome for something so new.
That said, I'm still quite skeptical that the new evaluation plans will transfrom the teaching profession, in part because of the lessons from history I'm learning as I research my book. For over a century, school reformers have been dissatisfied with how teachers are evaluated, yet overhauling rating systems has not, historically, been an effective way to improve educational outcomes for kids. This is like hoping to lose weight by buying a new, high-tech scale, without changing your diet or exercise routines.
During the late nineteenth century, the New York City schools used an "excellent-good-fair-bad" rating system for teachers. When reformer William Maxwell became superintendent in 1898, he complained that 99.5 percent of teachers were rated "good" and instituted a plan to grade teachers on an A-D scale instead. The city distributed intricate tables for judging teachers’ output. First, teachers would be measured by evidence of their students’ learning, which could be demonstrated through test scores or examples of children’s essays, penmanship, and drawings. Teachers would also be judged on their personal characteristics and given numeric ratings in largely subjective categories, such as “obedience,” “honesty of work,” “dress,” “voice,” and “force of character.” A teacher’s command of classroom discipline could be assessed by counting the number of students who were late or unruly, and even by timing the number of seconds and minutes it took for a teacher to distribute or collect worksheets.
By the late teens, the vast majorty of teachers were earning perennial ratings of B+, the exact sort of slightly-better-than-average rating that had predominated under the previous plan. In prominent education journals, dissident principals like Alexander Fichlander, a Brooklyn leftist, explained that the paperwork involved with implementing the system was so burdensome that administrators rushed through it; what's more, there was little incentive to spend a lot of time rating teachers if the district provided no extra funding or training to those who needed to improve. Additionally, when managers find it is difficult to replace low-performing teachers with workers who are more effective–another likely outcome–they may decide evaluation systems are not worth their time.
Because of these problems, by mid-century, detailed evaluation systems were being replaced by simpler "satisfactory-unsatisfactory" plans, which today are being replaced by value-added measurement and frequent observation notes. But if the new evaluation systems end up being more about paperwork than about improving practice, then they, too, will fail to improve instruction and will lose their political palatibility.