TSA and scientific method — sworn enemies?

Over the weekend the Los Angeles Times featured a story with this headline: “TSA scanners pose negligible risk to passengers, new test shows.”

Aside from the propaganda aspects of the headline, consistent with other TSA “good news,” the story and its underpinnings are fundamentall flawed, as has been reported here and elsewhere in the media.

The story claims that Asst. Professor Taly Gilat-Schmidt at Marquette University reviewed the operational test data from a backscatter (x-ray) scanner tested several years ago and deemed the scanners safe.

Or did she?

A subsequent article published at CNN expanded on her comments:

. . . the study’s author, professor Taly Gilat-Schmidt, said the research does not answer the biggest question on travelers’ minds: Are scanners safe? She said more independent research is needed.

Gilat-Schmidt added that though she goes through the backscatter machines, “I don’t feel comfortable putting my kids through them.”

There are several curious aspects to these reports. First, neither relates any “new” tests. Gilat-Schmidt’s study was essentially a repeat of a study done by Rebecca Smith-Bindman at the University of California, San Francisco in March 2011 and relied on the same performance data — that is, data provided by the TSA.

Second, neither study involved scanners actually in use at airports, the scanners that people pass through every day.

While it should come as no surprise that the TSA and the corporation OSI, manufacturer of the Rapiscan scanners, would have an interest in having test data appear in the most favorable light, it’s curious that two separate researchers would attempt to validate scanner safety based solely on TSA-provided data.

There are two crucial flaws in both studies: the sample size of one data set is too small to be statistically significant, and the original tests fail to adhere to basic scientific method. This scenario would not earn a passing grade in a high school science class, much less a reputable scientific journal.

As some people might recall from science class, scientific method requires that three conditions be met for a test to be valid:

1. The test must be replicable or repeatable.
2. The data must recorded and available to peers for validating.
3. The test must pass peer review.

Since no scanners were actually tested in these studies, it is impossible to verify or replicate the data. The scanner safety results have been refuted by several experts, including Dr. David J Brenner of Columbia University, Dr. Russell Blaylock, and Dr. John Sedat of the University of California in San Francisco. And while the performance test data from sample machines is available, it has not been validated by independent testing. Neither did the Johns Hopkins University nor National Institute for Occupational Safety tests evaluate medical implications, only radiation emission rates, despite the TSA’s repeated claims to the contrary.

With respect to the statistical quality of these studies, a valid test would need to test numerous scanners randomly selected from a pool of them. There are several ways to determine how large the sample must be to be deemed statistically significant, or in layman’s terms, reasonably reliable. One method is through a measure of the “confidence interval.” This is a measure of how reliable test predictions are. A small interval, say 95%, requires a larger sample but is very reliable. A 90% interval requires a smaller sample but isn’t as precise. An appropriate sample size can be computed via formula or an online Sample Size Calculator.

According to the TSA, there are about 700 scanners now in operation, both backscatter and millimeter wave scanners. A reasonable confidence interval of 10% at a 90% confidence level would require that at least 59 scanners be tested from the operating group to be considered statistically valid. As it stands there were fewer than 5 devices tested, and these were brand new, calibrated scanners, operated under the supervision of the manufacturer.

The scanners in operation every day at airports are calibrated . . . when? How often? By whom? Qualified technicians? Medical experts? What happens when the machines go out of calibration? How long before they’re properly calibrated again? Who’s tracking this? Where’s the data?

These are all questions that both Gilat-Schmidt and Smith-Bindman acknowledge are valid. Both say that independent tests should be done, on scanners actually in use at airports, but so far the TSA has refused to allow such testing.

Under normal circumstances, the TSA’s “testing” and the subsequent “studies” would have been summarily ridiculed by any reputable academic or scientific institution. (Perhaps that’s why the European Union has banned backscatter scanners.) Instead, these untested machines have been forced on the American public.

So the next time you see a parent send a child into a scanner and assume a position of surrender, you might ask why Professor Gilat-Schmidt won’t allow her children to use these devices and why the TSA won’t allow testing of them.

(Photo: Flickr Creative Commons/Mike Licht, Notions Capital)