The Reference Desk: The Science of Counting the Dead

Wednesday, October 18, 2006

The Science of Counting the Dead

Rebecca Goldin of Stats.org at George Mason University writes:
A recent study published in the Lancet claims that over 650,000 “excess” deaths have occurred in Iraq since the invasion in March, 2003. STATS look at how scientists figure these numbers out, how their methods compare to other counts, and whether criticism of the numbers is justified. A companion article examines the media coverage.

See here for a separate analysis of Steven E. Moore's criticism of the number of cluster points used in the Lancet study.

If you want to know the number of people who died in 2005 from heart disease in the United States, you need go no further than a website hosted by the Centers for Disease Control (CDC), which collects the information every year. Every death in the United States is recorded by the National Center for Health Statistics, as is the main cause of death.

There are, of course, imperfections. There can be more than one cause of death or the cause can be unknown; a suicide might have been a murder; sometimes a body is never found; there have also been times when this system fails, such as when AIDS first emerged.

War-torn countries do not have central registries to record deaths. People do not necessarily die in hospitals, and their bodies are not necessarily sent to morgues. While the press makes no claim to having actually seen all the deaths that occur, the website Iraq Body Count (IBC) keeps a database of “media-reported civilian deaths in Iraq that have resulted from the 2003 military intervention by the USA and its allies.” The IBC does not count excess deaths due to a deterioration of infrastructure, lack of hospitals or clean water. Nor does it count deaths not reported by the media. At least in theory, innumerable deaths occur quietly, under the radar screen of any accounting office.

The Iraqi health ministry also counts deaths. However, the BBC reported in 2005 that the recorded deaths were based on hospital records, which are unreliable when records and even hospitals are being destroyed. And in December 2003, the ministry ordered a halt to all attempts to count civilian deaths, according to the Associated Press. Currently, the official number of dead is about 50,000, based on hospital and morgue data.

Public health researchers have rejected this official tally of deaths in favor of an epidemiological approach. In a careful study published in the Lancet, a prestigious British journal for medicine, professors from Johns Hopkins University and the School of Medicine at Al Mustansirlya Univesity in Baghdad found through a random sampling of Iraqi households that over 650,000 deaths have occurred in Iraq since the invasion in 2003, that would not have occurred had there not been war.

While the Lancet numbers are shocking, the study’s methodology is not. The scientific community is in agreement over the statistical methods used to collect the data and the validity of the conclusions drawn by the researchers conducting the study. When the prequel to this study appeared two years ago by the same authors (at that time, 100,000 excess deaths were reported), the Chronicle of Higher Education published a long article explaining the support within the scientific community for the methods used.

President Bush, however, says he does “not consider it a credible report” and the media refer to the study as “controversial.” And even as the Associated Press reported mixed reviews, all the scientists quoted in its piece on the “controversy” were solidly behind the methods used. Indeed, the Washington Post points out that this and the earlier study are the “only ones to estimate mortality in Iraq using scientific methods.”

How can science be done by surveys and is cluster sampling nonsense?
Surveys are at the heart of epidemiological studies in which prevalence information (how often a disease or trait – or death– occurs) is not available through centralized sources. One of the most widely cited surveys in the US is the National Health and Nutrition Examination Survey which estimates a variety of information, from how many Americans have Diabetes to who uses pesticides. This is carried out under the auspices of the National Center for Health, which is in turn under the CDC. While, in theory, some of this information is available through other sources – doctors, for example, could report how many of their patients are treated for diabetes – there is no way of centrally recording the information and making sure that everyone with diabetes is actually counted. As a consequence, statistics have been developed to solve this problem.

Cluster sampling is a well-established in statistics, and is routinely used to estimate casualties in natural disasters or war zones. For the Iraq study the researchers randomly chose people to interview about deaths in their families, interviewed a cluster of households around them, and then extrapolated the results to the whole population. There is nothing controversial in the method itself, though people can certainly question whether the sampling was done correctly.

As with all surveying, the result is still an estimate, not an exact number. That’s just because a sample of the population was interviewed instead of every person. Thus, the authors of the Lancet study didn’t find 650,000 dead people – they found some 547 deaths after talking to about 12,800 people and extrapolated to how many they would have found had they talked to 27 million. They compared this to how many would have died at previous mortality rates before March 2003. The estimate is only as good as the sample population approximates the whole population. But the more people you survey, the more accurate the estimate.

Thus, 650,000 deaths is only an estimate; the range of possible deaths is actually 392,979 to 942,636. What this means is that we can be 95 percent certain that the number of excess deaths is in this range, but our best estimate is 654,965. You can think of this as a bell curve, centered 654,965 where the curve is highest. The other values in the range are less likely than to be the “true value” though not as much less likely as a number outside the range.

How good is the science in this particular study?
There has been a wealth of material on the web attacking the Lancet study. Most of it is devoid of science, and ranges from outrage at the numbers (it’s impossible to believe it could be so high), to accusations of bias based on the authors’ views of U.S. foreign policy. Interested parties such as the Iraqi government responded quickly by calling the numbers “inflated” and “far from the truth”, rather than putting forward any real reasons why these numbers are unlikely to have occurred. The Washington Post reported that the Defense Department’s response was that coalition forces “takes enormous precautions,” and suggested that the deaths are the “result of insurgent activity”.

In statistics error does not mean “mistake” – it is, rather, a measurement of how certain we can be of the results. In the Lancet study, and studies of a similar kind, there are two types of possible error: one coming from built-in bias and one coming from the use of statistics itself. While bias can hardly ever be teased out if it is intrinsic to the study, there are many techniques to minimize the error due to chance. The Lancet authors took care to interview enough families (about 1800 households) so that the possibility that they randomly chose families more affected by violence than others would be small enough not to affect their overall message. That message is essentially that other estimates of deaths due to the war are off by an order of magnitude.

The error intrinsic to statistics is often a target of criticism: if there’s error no matter what we do, how can we know anything? That line of reasoning makes about as much as sense as saying “since I’m not going to get exactly half heads and half tails if I flip a coin, I can’t say anything at all about whether a coin is biased.” Of course we can: we can calculate the likelihood that flipping a coin will be heads or tails. We can even calculate that the likelihood of getting all heads or all tails when flipping a coin ten times is less than one in 500. This leads us to the conclusion: if someone happens to flip a coin ten times and gets all heads, the coin is probably biased.

Since a survey does not actually interview everyone, it is possible that, purely by chance, the sample does not represent the whole population. For example, in conducting a poll between two candidates who are actually neck-and-neck, a pollster could, inadvertently, interview only Democrats. The survey would then get the result that the public is hugely in favor of one of the candidates and not the other – contrary to what the population actually feels. However, the chance of this happening is practicallyzero if there are enough people surveyed. If there are only ten people surveyed, it wouldn’t be so surprising if they were all Democrats. But if 1,000 randomly chosen people are interviewed, it is practically impossible to end up with all Democrats.

In the same way, it is theoretically possible for the scientists in the Lancet study to have interviewed 1,800 households that just happened to be wracked by violence, while the rest of the country was not. Or it could happen that the specific regions randomly chosen by the scientists were more heavily affected by violence than the rest of the country. The main point here is that these scenarios are extremely unlikely to occur, even though no one can rule out that possibility.

The error coming from the use of statistics is found in the confidence interval. In the case of the Iraqi deaths study, the confidence interval for the number of excess deaths is 392,979 to 942,636 people. What this means is that, if the survey were conducted again, we could be 95 percent confident that the excess deaths would fall in this range again.

The most likely number of excess deaths is 654,965. In terms of probabilities, it means that re-doing the interviews would result in a number that is much more likely to be near this figure than it is to be near 400,000 or near 900,000. We can be very confident that the number of deaths is extremely unlikely to have been less than 392,000 (less than 2.5 percent chance). For those who question the very technique of sampling, Cervantes -- a medical and health sociologist -- explains how the methods are standard fare for those doing this kind of research, as does any basic text on how to conduct polls.

Does anyone disagree with the study based on scientific principles?
At The Questionable Authority, blogger Mike Dunford points out some possible bias that might have led the researchers to numbers higher than they should be. First, he argues that the Lancet study used population estimates obtained by a joint Iraqi/UN population study, rather than those of the Iraqi Ministry of Health, which the same authors had used two years earlier. Dunford points out that if the total population (estimated to be approximately 27 million people) is invalid, then so is the estimate of 650,000. This is certainly true, but there is no reason to suspect that these organizations would be biased towards reporting a larger population than thereactually is. Dunford seems to imply that there are vying estimates out there, but he only cites information from 18 months earlier. If Dunford is correct that the population has been overestimated by as much as 11 percent, then the excess deaths should actually be estimated at about 580,000 instead of 650,000.

Dunford also points out that the excess deaths attributed to nonviolent causes was not statistically significant, and that, therefore, they should not be included in the total. Here, this is simply a question of standard statistical protocol. The main purpose of the study is to measure excess deaths, without regard to cause. For this, the nonviolent causes are relevant, even if not statistically significant by themselves. The authors did find that the increase in violent deaths was (highly) statistically significant, which is why they are reported separately. Thus it would be difficult to argue from this study that Iraq’s infrastructure is falling apart and that people dying from a lack of hospitals. But the authors have not made such claims in their paper.

Flares into Darkness argues that the sampling method would invariably favor densely populated areas, and that these areas would have disproportionate levels of bombs. It is certainly true that densely populated areas are more likely to be sampled – but only proportional to their population. In other words, if ten times as many people live in Region A than live in rural Region B, then Region A is ten times as likely to be chosen as a sampling destination. Overall, this will not have the effect of oversampling cities; it will have the effect of sampling cities proportional to their population, and rural areas proportional to theirs. Flares into Darkness insists that the room that these scientists had to change who they interviewed based on perceived threats gave them just enough leeway to cheat and pick places with more deaths. But this accusation is tantamount to their fixing the data; it simply doesn’t address the core findings of the study.

Flares into Darkness also claims that overall rates of death could be affected by the fact that deaths with specific causes could be correlated: a car bomb, for example, could kill several people at once in neighboring houses. If the sample happened to take in a neighborhood that took a bad hit from car bombs, then it could lead to an incorrect extrapolation to the whole population, when, the researchers just happened to sample a badly-hit area.

Yet again, it is standard statistical protocol in a cluster sampling survey to take this into account. The authors adjust for the fact that there is higher correlation within clusters than across clusters. As the authors point out in their analysis section, “The SE (standard error) for mortality rates were calculated with robust variance estimation that took into account the correlation between rates of death within the same cluster over time.”

While the authors did consider the issue of correlated deaths, it should also be noted that even if the authors did not correctly account for these correlations, the affect would be to widen the confidence interval, not lower the estimate. For just as correlated deaths could mean that what the observers saw was a fluke, it could also mean that the observers didn’t see the truly bad parts.

The last criticism that has spread widely in the blogosphere is that the pre-war mortality rates were underestimated. Since this study used prewar death rates to estimate how many deaths would have occurred anyway – and subtracted these off to obtain the “excess deaths,” a lower pre-war death rate makes for a higher estimate of excess deaths. There is little compelling reason to believe that the prewar death rates were underestimated, as they were corroborated by the the study itself.

The study found a prewar death rate of 5.5 per 1000 people per year, which was roughly the same as that found by the CIA and the U.S. Census bureau, according to Gilbert Burnham, one of the authors of the study from John Hopkins University. In other words, the prewar death rate was not just “invented” or taken from an unreliable source; it was supported by data from the same interviews.

What should we not take from this study?
The Lancet study does an excellent job in counting the dead, but its purpose does not lie in pointing fingers. While the study reports that 31 percent of excess deaths were caused by coalition forces, it is possible that those reporting the crimes might be biased by anit-coalition sentiment. Those families may be more likely to believe and report that a violent death was attributable to coalition forces. Of course, bias could go the other direction as well – we simply do not know. We also cannot assess who died – civilians or those involved with the armed conflict. Again, it would be easy to see how bias would affect reports by family members.

The methods used by this study are the only scientific methods we have for discovering death rates in war torn countries without the infrastructure to report all deaths through central means. Instead of dismissing over half a million dead people as a political ploy, as did Anthony Cordesman of the Center for Strategic & International Studies in Washington, we ought to embrace science as opening our eyes to a tragedy whose death scale has been vastly underestimated until now.