Every so often, I start pulling on a thread and find things I think others might find to be interesting. Often, it’s in reaction to something that makes me say, “that’s funny” or “that can’t be right.” That was the case this week. It starts with this cartoon from Steven Breen at Townhall.com who has been in these pages before.
If that’s not clear to you, the story is that some group did a computer simulation, which they called a study, and concluded that the annual motorcycle rally in Sturgis, South Dakota, was a Covid-19 superspreader event that led to about 275,000 “cases” of the virus costing the country over $12 billion dollars. If you look around you’ll find several numbers for their conclusion.
I didn’t cover anything about the Sturgis rally, but my blog brothers in Miami (and elsewhere) at Gun Free Zone did - along with others. From what I heard, the event was truly “largely peaceful” and while the riders there, given their strong independent streak, didn’t necessarily follow the lockdown mantra, the number of cases was more like 260 than 260,000. That's a big difference. So what gives?
I was able to find a link to the paper so that I could see what they did. To say it’s unconventional in its approach is a vast understatement. Given the factor of a thousand, three orders of magnitude difference, between the reported cases by South Dakota Public Health (which counts people in other states) and the paper, it seems that the true number has to be closer to the lower bound. Even if it’s 2,750 instead of 275,000 it’s a lot of cases but the approach and results are wrong.
To begin with, the paper was by four economists: Dhaval Dave, Andrew I. Friedson, Drew McNichols, and Joseph J. Sabia all associated with various universities and published in something called the IZA Institute of Labor Economics, so I’ll call it the IZA paper and pronounce that “eye zah”. All of these authors are not particularly trained in any aspect of virology, epidemiology or any medical field that I can tell. Of course, that applies to me, too.
So where does the data come from? How do they get this result? The abstract of the paper summarizes their method as using cell phone data to track all the phones at the rally and then track them to their home counties around the country. Then they used the number of positive test results in those counties from the CDC and attribute all of those increases everywhere in the country to coming from Sturgis.
First, using anonymized cell phone data from SafeGraph, Inc. we document that (i) smartphone pings from non-residents, and (ii) foot traffic at restaurants and bars, retail establishments, entertainment venues, hotels and campgrounds each rose substantially in the census block groups hosting Sturgis rally events. Stay-at-home behavior among local residents, as measured by median hours spent at home, fell. Second, using data from the Centers for Disease Control and Prevention (CDC) and a synthetic control approach, we show that by September 2, a month following the onset of the Rally, COVID-19 cases increased by approximately 6 to 7 cases per 1,000 population in its home county of Meade.
I’m going to concentrate on two words in their description: synthetic control. When I read that, I have to admit to never seeing that term before, and apparently it was a technique developed in 2010. A picture is probably a thousand words here, so here’s a plot from the IZA paper.
How do they know that the positive cases in those counties weren’t going to go up more than the extrapolated increase regardless of whether people attended the rally? Isn’t it reasonable to ask “what else could have caused this effect? What else was going on?”
Extrapolation off the end of hard data is not a control. A controlled trial is one in which there are two groups as closely matched as possible in every aspect; one group has an experimental treatment given to it and the other does not. Further, when using animals or especially people, the control and experimental subjects should not know which group they’re in – that’s called a blind study. In the best medical studies of drugs and other treatments, neither the experimental group nor the experimenter giving the drug should know which group is which. That’s called a double blind study (some treatments can’t be double-blinded because everyone can see which they're getting). The reason this is the gold standard way of experimenting is the design totally eliminates any known way for the difference between the two groups to be caused by anything other than the treatment.
This is not a controlled experiment; synthetic control apparently means no control. They think they monitored people who went to Sturgis. Let’s say they did, but they have no way of knowing if those people are the people who came down with the disease or the people who spread it. They associate two variables; (1) cell phones that were in or around Sturgis and then went somewhere else, and (2) an increase in Covid 19 cases in those “somewhere else” places. If someone who never came in contact with a traveler to Sturgis, or any intermediate contact, came down with the virus their experiment is incapable of knowing that. It’s an associational study, no better than the junk science I’ve railed about so many times before.
This is how we get the correlations that say things like eating table salt is correlated with a positive relationship with your Internet Service Provider, or that eating egg rolls is highly correlated with owning a dog (that isn’t in the egg roll).
Steven Green, sometimes known as the Vodka Pundit and who writes for PJ Media, found a couple of remarkable data points. Where does the figure of almost $12 billion come from? Their CDC data said these counties had some number of new cases. Remember, when they say “case”, it means a positive test result and does not mean the person was sick or actually even had the virus, given the reliability of some tests. They then simply multiply the number of cases times a number they pulled out of, well, their IZA.
As of Wednesday, the Public Health Officials have attributed one fatality to spread from Sturgis.
Note that the study supporting $46k per Covid case is only on the IZA website and has not been replicated. That’s a big red warning flag. The warning about the “methodology is suspect at best” is especially applicable to it costing $11k to treat an asymptomatic case. Need I remind everyone that people with NO symptoms are not sick, don’t think that they’re sick and need NO treatment? For $11,000 per patient to treat asymptomatic people, I’ll treat as many as you can send.
My takeaways on the study are that it doesn’t show much of anything. Their tracking of cellphones using anonymized data is new, but the position resolution isn’t good. Phones were tracked to “Census Block Groups” which appear to be too big to precisely locate a given bar or restaurant, but could certainly tell if a phone went to Sturgis from Minnesota or North Dakota and back. CBGs are not uniformly sized. In the end, we know that phones were in Sturgis, tracked around the event and then to somewhere else. It’s the linking of the phones to the statistics in the CDC case rates that’s the weak spot.
The weak spot in the technique appears to be the synthetic control group. They tried to match the population density and “urbanicity” of the control curve to Meade County within some limits (which seem reasonable) and they conclude that Sturgis had an increase in the rate of positive tests, but that’s not controversial. What’s controversial is the number the state claims vs the number claimed in the IZA paper. The outrageous claims aren’t in Sturgis and Meade County, the outrageous claims are whatever allows them to say there were 275,000 cases that track back to Sturgis. I simply don’t see how they can say that.
I’m stuck where I was a couple of paragraphs ago. Tracking the phone from Sturgis to (pick a place) and then saying there was some number of positive tests in that place doesn’t mean the phone’s owner spread the virus or had anything to do with it. It simply means the phone was in both places. I don’t see how causality can be linked like that. One can say something along the lines of, “do you think it’s a coincidence that all those places had higher numbers of cases after people came back from Sturgis?” I’d have to say “it could be.” The issue is that’s what real controlled experiments do for us; they allow us to know for sure, rather than create these elaborate castles in the air.
Proving a result isn't a coincidence is what statistical testing is all about.