Revised hookworm replication

After releasing and blogging a paper in December about the GiveWell replication of Hoyt Bleakley’s study of hookworm eradication in the American South, I submitted it to the Quarterly Journal of Economics, which published the original paper in 2007. Around the first of the year, QJE rejected the paper, enclosing comments from four reviewers, including from Bleakley. The comments were very helpful in identifying errors in the replication, suggesting new things to do, and pushing me to sharpen my thinking and writing.

I just posted a new version. The story does not change. As a result, I am more sure now that the relative gains in historically hookworm-burdened parts of the South continued trends that began well before and, in the case of income, continued well after. I made two significant substantive changes, both of which strengthen my skepticism.

First, in controlling for polynomial trends in time when studying impacts on income, I now report a Bayesian information criterion statistic to help judge whether the additional high-order polynomial terms are overfitting the data. The BIC turns out to favor modeling ambient time trends with cubicly or quarticly. This makes sense because the long-term trends look roughly S shaped, and parabolic/quadratic controls—the highest order reported in the Bleakley (2007) tables—don’t seem adequate to sponging up such patterns. The BIC-favored fits do not ascribe much statistically detectable impact to the hookworm eradication campaign. (See Table 9.)

Second, I fit the same long-term income data with a new model. This one has a piecewise-linear form with two kinks, just like the “Exp” impact form postulated in Bleakley (2007). But whereas in Bleakley (2007) and in the first draft of the replication the two kinks were taken to occur in 1891 and 1910—which bracketed the period in which children were born who could increasingly benefit from the campaign—in the new model, the kinks can vary. They are estimated from the data using the least-squares best-fit criterion. That produces graphs like these:
If the eradication campaign were a strong factor in long term trends, then we would expect, as Bleakley argues, that progress would accelerate around 1891 and decelerate around 1910, which years are marked with vertical grey lines. But that doesn’t seem to be what happened. As a robustness test, I did a run with up to four kinks allowed. I think it still doesn’t suggest much of a role for hookworm eradication. That doesn’t mean the influence is not there at all, but it strongly suggests that the influence is too small to pop out of the data in a way that would be compelling.

Revised data and code are here (640 MB). Unfortunately some of the data used in the graphs above come from new 100% census samples, which I am not permitted to redistribute. I provide directions on how to obtain the data, but fully reproducing what I did requires knowing your way around Microsoft SQL Server and Microsoft SSIS, which come in free versions, but are complicated.

print