The Real Problem with Data Mining

Crossposted on The New Republic.

In the past 48 hours, the American public has been rocked by revelations about the scope of domestic government surveillance. First, we learned that the National Security Agency receives information about the majority of domestic and international phone calls placed in this country. Then we discovered that the NSA and FBI collaborate to vacuum up real-time information from the servers of most major Internet companies as well, including the content of emails, video chats, documents, and more. Although this program is ostensibly directed at gathering data about foreigners, it is likely to sweep in significant amounts of information about Americans too.

There are, needless to say, significant privacy and civil-liberties concerns here. But there’s another major problem, too: This kind of dragnet-style data capture simply doesn’t keep us safe.

First, intelligence and law enforcement agencies are increasingly drowning in data; the more that comes in, the harder it is to stay afloat. Most recently, the failure of the intelligence community to intercept the 2009 “underwear bomber” was blamed in large part on a surfeit of information: according to an official White House review, a significant amount of critical information was “embedded in a large volume of other data.” Similarly, the independent investigation of the alleged shootings by U.S. Army Major Nidal Hasan at Fort Hood concluded that the “crushing volume” of information was one of the factors that hampered the FBI’s analysis before the attack.

Multiple security officials have echoed this assessment. As one veteran CIA agent told The Washington Post in 2010, “The problem is that the system is clogged with information. Most of it isn’t of interest, but people are afraid not to put it in.” A former Department of Homeland Security official told a Senate subcommittee that there was “a lot of data clogging the system with no value.” Even former Defense Secretary Robert Gates acknowledged that “we’ve built tremendous capability, but do we have more than we need?” And the NSA itself was brought to a grinding halt before 9/11 by the “torrent of data” pouring into the system, leaving the agency “brain-dead” for half a week and “[unable] to process information,” as its then-director Gen. Michael Hayden publicly acknowledged.

National security hawks say there’s a simple answer to this glut: data mining. The NSA has apparently described its computer systems as having the ability to “manipulate and analyze huge volumes of data at mind-boggling speeds.” Could those systems pore through this information trove to come up with unassailable patterns of terrorist activity? The Department of Defense and security experts have concluded that the answer is no: There is simply no known way to effectively anticipate terrorist threats.

Credit card companies are held up as the data-mining paradigm. But the companies’ success in detecting fraud is due to factors that don’t exist in the counterterrorism context: the massive volume of transactions, the high rate of fraud, the existence of identifiable patterns (for instance, if a thief tests a stolen card at a gas station to check if it works, and then immediately purchases more expensive items), and the relatively low cost of a false positive: a call to the card’s owner and, at worst, premature closure of a legitimate account.

By contrast, there have been a relatively small number of attempted or successful terrorist attacks, which means that there are no reliable “signatures” to use for pattern modeling. Even in the highly improbable and undesirable circumstance that the number of attacks rises significantly, they are unlikely to share enough characteristics to create reliable patterns.

Moreover, the surveillance programs that have been disclosed pull in a huge range of data: phone records, emails, Web searches, credit card transactions, documents, live chats. And that’s just what we know so far. Not only does this information raise First Amendment concerns where it “accidentally” includes Americans’ communications, purchases, and more, but the variety greatly complicates the data-mining process. The Wall Street Journal has editorialized that this large sample size is simply a necessary byproduct of the mechanics of data mining, allowing the NSA to “sweep broadly to learn what is normal and refine the deviations.” But as the libertarian Cato Institute has argued, not only is such a system “offensive to traditional American freedom,” it can be evaded if the terrorists “act as normally as possible.”

And when the government gets it wrong—which it will—the consequences are far-reaching. A person falsely suspected of involvement in a terrorist scheme will become the target of long-term scrutiny by law enforcement and intelligence agencies. She may be placed on a watchlist or even a no-fly list, restricting her freedom to travel and ensuring that her movements will be monitored by the government. Her family and friends may become targets as well.

The FBI’s and NSA’s scheme is an affront to democratic values. Let’s also not pretend it’s an effective and efficient way of keeping us safe.

Suggested Results

Suggested Results

The Real Problem with Data Mining

Informed citizens are democracy’s best defense