John Thompson: Spinning "Success" at Gates-Funded Projects

John Thompson is a teacher and historian in Oklahoma who writes frequently here, at Huffington Post, and on other blogs.

Ironically, the Center for Education Policy Research (CEPR) revisionist studies, “Evaluating Newark’s Education Reforms” by Tom Kane et. al, were released as Bill Gates announced his latest, new approach to school reform. This is important because think tank papers consistently perform two basic functions. They first provide pro-reform spin for the mainstream media. Secondly, they reassure the “Billionaires Boys Club” by presenting the case that their critics are wrong. These studies typically imply that if educators and journalists had bought into the Gates’, Mark Zuckerberg’s, and other venture philanthropists’ theories, their policies (such as closing schools and expanding charters) would have worked.

Zuckerberg and the CEPR weren’t likely to be happy with the first headlines prompted by their new research on test score gains produced by the infamous $200 million Newark experiment. USA Today announced that the research found “a bit of progress.” Veteran journalist Greg Toppo also noted:

The study was funded by the Chan-Zuckerberg Initiative and conducted by a number of Harvard researchers, including Tom Kane, who said that the study’s results were independent of its funding source.

Toppo reported that the Zuckerberg-led grant “made a difference — in a limited way.” He summarized CEPR’s claims, “Newark students improved sharply in English. In math? Not so much.”

Toppo recalled Dale Russakoff’s “widely admired book” which found that “the effort produced ‘at least as much rancor as reform.’” He also cited Kane on the disappointing math results which the corporate-funded researcher said may look different when data from the spring 2017 tests become available,” and scores “could” rise in the future.

I stress Kane’s use of the word “could” because he has a long history of using that sort of word when spinning the modest results he documents in research studies that put Gates Foundation experiments in the best light. For instance, Kane’s study of the Gates’ value-added teacher evaluations concluded that teachers’ effectiveness “can” be estimated, although he reported little or no evidence that they would be estimated accurately enough to make those evaluations valid and reliable. After driving for the change in the laws of more that forty states, the Gates Foundation merely concluded, “It is possible to develop reliable measures” such as those that the law required, while not offering a plausible scenario for doing so.

And that leads to the one quarrel I have with Toppo’s wording. What does he mean when he says that test score growth in English improved “sharply?” And what do Kane et. al mean when reporting that those test scores improved “significantly?”

English growth scores only improved “sharply” in one year, 2015. After five years, the $200 million investment’s one success resulted in less than .08 standard deviation Newark’s test score growth in English relative to similar NJ students.

It’s beyond my expertise to explain how such a meager gain, measured by comparing such small numbers of test results from Newark on a new test, could be seen as significant according to the dictionary definition of the term, as opposed to just being statistically significant. But reading the CEPR evidence, it seems that asking questions that are relevant for real world policy decisions is beyond Kane et. al. They acknowledge one major problem with the new scores; schools that began early in teaching to the new PARCC tests would be more likely to have higher scores. On the other hand, their discussion of an even more important point, missing student scores, completely misses the point.

Kane et. al present two charts that reveal patterns that are virtually identical. As with ELA results, before 2014 the Newark math value-added scores dropped in comparison to that of similar New Jersey students. In 2015, math scores soared by nearly .1 std. But during that year, the percentage of students with missing scores increased dramatically, by almost .2 std! The next year, as the percentage of students missing scores dropped just as dramatically, math scores declined so much that all of the five year gains were wiped out.

Rather than print a similar for graph ELA, the authors merely said, “The plot for ELA was similar.”

Why didn’t Kane et. al see the need to address the most logical correlation? When the percentage of missing scores goes up, Newark test score growth goes up. When the percentage of students with missing test scores goes down, test score growth goes down.

And this leads to the implicit recommendations by Kane et. al, as well as the questions they should have asked before making them. They attribute the gains to closing schools and expanding charters. They indicate that such an approach (which, of course, is dear to the hearts of “the Billionaires Boys Club”) could institutionalize better results for students who attend high-poverty neighborhood schools. The few relevant numbers they reported argue against such a theory.

In the first place, Newark charters had previously served higher-performing students, with the non-representative KIPP and North Star Academy being the charters that expanded the most. So the first question is whether those charters could change their model so that the higher-challenge students in neighborhood schools could be retained. Newark’s free and reduced lunch student population averaged 88% over the five-year study; charters were six points lower. However, KIPP and North Star tend to serve a relatively larger percentage of low income and a smaller percentage of poor students than traditional public schools in the inner city. For instance, Newark’s North Star has served far fewer poor children (14.6% fewer free lunch) while serving relatively more students whose higher family incomes qualify for reduced lunch.

Similarly, Newark’s charters served about 60% fewer students with disabilities, but that is just part of the story. For example, North Star has a record of “serving hardly any children with disabilities and few or none with more severe disabilities.”

These charters also have a long record of benefitting from greater rates of attrition when raising test scores. North Star has a history of suspending students “at an alarming rate,” and that is a reason why “only about half (of 5th graders) ever made it to senior year.” Similarly, as Richard Kahlenberg shows:

The big difference between KIPP and regular public schools…is that whereas struggling students come and go at regular schools, at KIPP, students leave but very few new students enter. Having few new entering students is an enormous advantage not only because low-scoring transfer students are kept out but also because in later grades, KIPP students are surrounded only by successful peers….

In other words, Kane et. al should have asked questions relevant to policy-making. For instance, how different were the tested charter students’ poverty and disability rates in comparison to their classmates who were enrolled in the first quarter? Why was it that the missing test score patterns seemed to have a far bigger effect on “within-school” outcomes? Why did they assume that Newark’s charters can be scaled up?

Did Gates-affiliated researchers overlook these obvious questions because they still are oblivious to realities within school systems? Or did they only ask the questions that they knew would produce answers that would please their bosses?

