Corante



About this author

CORANTE

Arnold Kling has a Ph.D. in economics from MIT; founded homefair.com, one of the very first commercial websites, in 1994; separated from Homefair in January 2000 after it was sold to Homestore; is author of Under the Radar: Starting Your Internet Business without Venture Capital, and is an essayist. Send comments to us at econ@corante.com

Subscribe
Recent Comments

Recent Trackbacks

CATEGORIZED POSTS
Bottom Line Archives
Site Search



Powered by
Movable Type 3.2
Don't Miss The DrugSafetyHub, a new blog on counterfeit drugs and the evolution of the pharma industry
The Bottom Line

« Points, Lines, and Curves | Main | Should You Short Amazon? »

October 23, 2003

Data Mining Misrepresented

Email This Entry

Posted by Arnold

Bruce Schneier confuses data mining with racial profiling.


Even those who say that terrorists are likely to be Arab males have it wrong. Richard Reid, the shoe bomber, was British. Jose Padilla, arrested in Chicago in 2002 as a "dirty bomb" suspect, was a Hispanic- American. The Unabomber had once taught mathematics at Berkeley. Terrorists can be male or female, European, Asian, African or Middle Eastern. Even grandmothers can be tricked into carrying bombs on board. One problem with profiling is that, by singling out one group, it ignores the other groups. Terrorists are a surprisingly diverse group of people.

The problem with this criticism is that it is backwards. You can do racial profiling without doing data mining. Data mining is a way of doing the opposite of racial profiling. Data mining is a way of finding groups of characteristics that in combination can separate potential terrorists from others. Racial profiling is based on factors that are not well correlated with terrorism.

If data mining were used, we could search fewer Arab Americans and stop more potential terrorists. It is the people who are opposed to data mining who are going to cause the security forces to resort to racial profiling.

If Schneier does not understand this--if he has no clue how Bayesian algorithms work--then his credentials as a security expert are way overblown. If he does understand statistical inference, then he is being a demagogue. Either way, my respect for Scheneier has evaporated.

Comments (4) | Category: transparent society


COMMENTS

1. Foolish Jordan on October 24, 2003 02:08 PM writes...

I think Bruce is arguing that *any* sort of pattern matching will not work, and uses racial profiling as a well-understood example. It seems to me that you have set up a straw man. Okay, so 'race' is not highly correlated with 'being a terrorist'. What is? If the power of Bayesian Algorithms can solve the problem, I'd like to see a demonstration of that, or at least a reasonable model, that shows that the number of false negatives and positives are sufficienlty small.

Permalink to Comment

2. Arnold Kling on October 24, 2003 02:29 PM writes...

The question is, what is "sufficiently small"? Are you satisfied with current airport screening, where grandmothers traveling with their grandchildren are as likely as anyone else to be searched?

I am not sure how well Bayesian algorithms could work in airport passenger screening. But one example is that during the DC sniper spree, the suspects stole a credit card, and a charge was denied right away because of the statistical profiling used by the credit card companies.

Right now, the credit card companies are better at spotting a suspect than are the law enforcement agencies. That is because the credit card companies use data mining.

Permalink to Comment

3. Zonker on October 24, 2003 05:36 PM writes...

Right now, the credit card companies are better at spotting a suspect

No, the credit card companies are better at spotting odd charging patterns which may or may not (likely, not) correspond with terrorist purchasing patterns. If I understand the system correctly, the credit card company looks for something like a shopping spree on a card that is usually only used for gas every week and so on. They're looking for stolen cards... they're not looking for purchases of specific items, as I understand it nor do I really want them to.

Therefore a terrorist who charges an airline ticket and hotel room isn't likely to raise a red flag if they're using a legitimate card, whereas if you decide to go purchase a bunch of stuff for a new apartment, you might find yourself flagged using this type of system.

I also think you've misconstrued Schneier's comments, he's not equating profiling with data mining -- he's arguing against the idea that "if we had enough data, we could pick terrorists out of crowds." Seems pretty obvious if you read the piece carefully.

Permalink to Comment

4. Dave Sheridan on October 24, 2003 05:40 PM writes...

As an example of getting it backward Scheier provides an example that suggests where data mining can improve on "profiling," when he says

"Even those who say that terrorists are likely to be Arab males have it wrong. Richard Reid, the shoe bomber, was British. Jose Padilla, arrested in Chicago in 2002 as a "dirty bomb" suspect, was a Hispanic- American. The Unabomber had once taught mathematics at Berkeley."

That's the point of data mining -- to improve on crude intuition.

It also appears he has little appreciation for how complex sysems are developed. It is impossible to know in advance what data and relationships among data are going to produce good predicitve performance.

Permalink to Comment


EMAIL THIS ENTRY TO A FRIEND

Email this entry to:

Your email address:

Message (optional):




RELATED ENTRIES
test entry
Taking a Break
Moore's Law and Military Technology
Biotech and Sports
I'll take Ohio
Email Innovation?
99-cent rip-off
If Brad DeLong called me stupid