Saturday, 26 of May of 2012

Data Mining on the Rocks or in the Whirlpool

Bruce Schneier over at Wired has written a long, involved article entitled “Why Data Mining Won’t Stop Terror.” (link) I found it hard to read but it contained an interesting point which I’ll try to summarize here.

Schneier argues that the government’s data mining program won’t work because of a dilemma in statistics. When you are devising a test to discover a condition in a large population, and you apply that test to people in the population, you run the risk of two possible errors:

Type I Error: Identifying a person as having the condition when they actually don’t
Type II Error: Identifying the person as not having the condition when they actually do

Type I Error is a “false positive” result, and Type II Error is a “false negative result.”

The data mining activities engaged in by the government involve shuffling through trillions and trillions of pieces of different sorts of information and trying to identify patterns in those pieces of information that suggest terrorist activity. One problem Schneier identifies is that there is no rock-solid, error-free model for predicting what sort of activity by a person (besides a terrorist attack itself) indicates terrorist activity. There will be times when a data miner’s model will prove wrong. Maybe Aunt Matilda is not in an Al Qaeda cell. Maybe she’s just contacting Middle Eastern countries and arranging a one-way trip because she wants to visit the artificial islands of Dubai and take a cruise ship back.

In other words, sometimes you’re bound to make a mistake in predicting terrorist activity. But out in the real world, you don’t know if you have an error in a particular case or not (because if you did, then by definition you’d have an error-free model) — you just get iffy cases that straddle the border between clearly signifying terrorist activity and clearly signifying non-terrorist actvity.

The question is, what to do with those iffy cases. What to do with Aunt Matilda? One alternative is to let people go free whenever they have something close to an iffy case, in the interest of not harming the innocent. This involves a Type II error. Sometimes you’ll get it right, but the risk of this approach is that you’ll let actual terrorists go free a good amount of the time. The alternative is to arrest iffy suspects like Aunt Matilda on the strength of a predictive profile alone. This involves a Type I error. Sometimes you’ll get it right, but the risk of this second approach is that you’ll put innocent people behind bars.

To express it graphically (borrowing from the tabular approach of Intuitor’s discussion of the justice system):

Data Mining Dilemma
Questionable Subject Is Not a Terrorist Questionable Subject Is a Terrorist

Arrest Questionable Subject

Type I Error:
Innocent Person Jailed
Justice:
Correct Decision

Let Questionable Subject Go Free

Justice:
Correct Decision

Type II Error:
Terrorist Goes Free

Keep in mind that there are no follow-up trials, since the people being watched haven’t committed any crime. No, the question is whether to detain people for (in the vocabulary of Majority Report) Pre-Crime. In a system where your predictions won’t be perfect, depending upon how authoritarian you are you’ll either let terrorists slip through your fingers or you’ll let innocent people go to jail. Schneier argues that if we choose the former, to let terrorists slip through our fingers when they are engaged in questionable activities, we will have a terrorist detection system that costs a whole lot of money yet doesn’t work. If we choose the latter approach to deal with error, jailing people who not only have done nothing wrong but are not even planning to do wrong, our country will lose its moral bearings and become a police state.


By the way, hello to everyone. This looks like it might be fun. I hope it stays up longer than the bulletin board did.

– freenut


1 Star2 Stars3 Stars4 Stars5 Stars (487 votes, average: 2.75 out of 5)
Loading ... Loading ...

Leave a comment


Comments RSS TrackBack 1 comment