The Reference Desk: Did Wall Street Journal Find Fatal Flaw in Lancet Iraq Study?

Wednesday, October 18, 2006

Did Wall Street Journal Find Fatal Flaw in Lancet Iraq Study?

Too few cluster points render Iraq casualty figure “bogus,” claims op-ed piece

For Stats.org at George Mason University, Rebecca Goldin, Ph.D., and Trevor Butterworth write:
In an opinion column for the October 18 edition of the Wall Street Journal, Steven E Moore argues that the Johns Hopkins researchers who conducted a survey of excess deaths in Iraq since 2003 screwed up – not through the statistical tools they used, which are sound, but through parsimony:

“…the key to the validity of cluster sampling is to use enough cluster points. In their 2006 report, "Mortality after the 2003 invasion of Iraq: a cross-sectional sample survey," the Johns Hopkins team says it used 47 cluster points for their sample of 1,849 interviews. This is astonishing: I wouldn't survey a junior high school, no less an entire country, using only 47 cluster points…

…What happens when you don't use enough cluster points in a survey? You get crazy results when compared to a known quantity, or a survey with more cluster points….

…With so few cluster points, it is highly unlikely the Johns Hopkins survey is representative of the population in Iraq.”

On the face of it, this sounds like a fatal flaw. But unless the sample is actually biased, a smaller number of cluster points only has the effect of widening the confidence interval. Polls don't like large confidence intervals, but for the purposes of estimating large numbers of people, even the wide confidence interval of the Lancet study is informative.

The point is that the number of clusters relative to the size of the population is less relevant than whether the sample of clusters is representative of the population. So when Moore implicitly criticizes the Lancet study in relation to a similar study on Kosovo which used 50 cluster points, “for a population of just 1.6 million, compared to Iraq's 27 million,” the issue is not one of brute numbers, but whether the clusters chosen are representative of the overall population.

Research biostatistician Steve Simon (by way of Deltoid at Science Blogs, who is highly critical of Moore’s article) explains the principle:

“‘Every cook knows that it only takes a single sip from a well-stirred soup to determine the taste.’ It's a nice analogy because you can visualize what happens when the soup is poorly stirred.

With regards to why a sample size characterizes a population of 10 million and a population of 10 thousand equally well, use the soup analogy again. A single sip is sufficient both for a small pot and a large pot.”

Moore also argues that the Lancet’s figures would have been more trustworthy if the researchers had taken demographic data such as gender, age, and education.

Unquestionably, it would have been better if the Lancet study had added demographic
Information as it's possible that they didn't control for some demographic bias. But when Moore says this would have enabled them to compare results with “a known demographic instrument, such as a census,” he is quite possibly overestimating the accuracy and usefulness of the only other demographic instrument available to the researchers, the 1997 Iraq Census.

What the John’s Hopkins survey has in its favor is that it extrapolated its cluster points to the general population using the 2004 "UNDP/Iraqi Ministry of Planning population estimates".

In the end, Moore has opened up some interesting lines of inquiry, but he has ended up over-reaching in an effort to prove the Lancet figures “bogus.”