INDC Journal

« Jack Bauer Takes Out Douglas Fir | Main | »

May 15, 2006
On Modeling, Databases, Etc. (UPDATED with dorkafork v. Bill DANCE OFF!)

Posted by Bill

The Weekly Standard expands upon my rudimentary data mining explanation served up in the comments to dorkafork's post ...

Ever since allowing the Pentagon's Total Information Awareness project to go down the tubes in 2003, the administration has failed to explain the potential of data mining, even as it secretly continues to use this vital technology. Thus, at every revelation of a government data mining program, privacy extremists enjoy unchallenged supremacy in characterizing the technology as a massive threat to life as we know it.

Only a paranoid solipsist could feel threatened by the recently revealed calling analysis program. Since late 2001, Verizon, BellSouth, and ATT have connected nearly two trillion calls, according to the Washington Post. The companies gave NSA the incoming and outgoing numbers of those calls, stripped of all identifying
information such as name or address. No conversational content was included. The NSA then put its supercharged computers to work analyzing patterns among the four trillion numbers involved in the two trillion calls, to look for clusters that might suggest terrorist connections. Though the details are unknown, they might search for calls to known terrorists, or, more speculatively, try to elicit templates of terror calling behavior from the data.

As a practical matter, no one's privacy is violated by such analysis. Memo to privacy nuts: The computer does not have a clue that you exist; it does not know what it is churning through; your phone number is meaningless to it. The press loves to stress the astounding volume of data that data mining can consume--the Washington Post's lead on May 12 warned that the administration had been "secretly . . . assembling gargantuan databases." But it is precisely the size of that data store that renders the image of individualized snooping so absurd.

True, the government can de-anonymize the data if connections to terror suspects emerge, and it is not known what threshold of proof the government uses to put a name to critical phone numbers. But until that point is reached, your privacy is at greater risk from the Goodyear blimp at a Stones concert than from the NSA's supercomputers churning through trillions of zeros and ones representing disembodied phone numbers.

All true enough, but the potential problem surrounds how the government acts on the information. But serious concern over the simple existence of such a database itself is demogogic, naive or both; this is the exact type of basic use of information technology that one would hope our government officials use, cynically expect them not to use and/or excoriate them for not having in the event of a successful terrorist attack on US soil.

That said, there are several challenges to the program's utility, which has been further diminished by a public revelation of the project:

1. Assuming the analysts create a statistical model based off of a reliably determined template of a terrorist calling behavior, the ongoing effort will sift through trillions of calls and apply values to various combinations and patterns that match this "terrorist model." It will then score and rank all of the numbers in the database, where the 0-10th percentile might represent the "10% of the population least likely to be terrorists" and the 90-100th percentile would be the "10% of the population most likely to be terrorists."

The fundamental challenge is that a "terrorist model" represents and attempts to identify the behavior of such an infinitesimal portion of the population - the number of individuals able and willing to blow up buildings and kill masses of people - without a wealth of particularly identifying characteristics - i.e., terrorist sleeper cells probably call out for pizza too - that the application of even a fantastically designed model based strictly on calls may only triple the government's chance at identifying people likely to be a terrorist. If the government appends data overlays of relevant information to the specific numbers in the model - say, "Arab ethnicity" - that chance of identifying those likely to be a terrorist might, for the sake of argument, quintiple. To be extremely charitable, let's even assume that outsized weights applied to calls made to Palestinian Aid organizations makes the model 100x more predictive.

If, say, one in a million people in the United States is a terrorist, and the upper reaches of a successful model increases the likelihood of being a terrorist 100x, you're still left with 100 in a million in a given population. And that's still an awful big haystack. In this sense, assuming accurate modeling, a utility might be found in the effort's ability to rule out huge swaths of the population, or simply cross-tab the "terrorist score" with searches on a specific phone number discovered elsewhere. For example, if Waleed Smith is the target of a terrorism investigation based on human intelligence received from an informant in Afghanistan, it might be a relevant ancillary endeavor to check out his "terrorist model score" and find out if he's at the tippy-tippy-top. That said, the specificity limitations of the best modeling are pretty clear for such a limited target population.


2. Assuming the database is used to flag calls to specific numbers known to be affiliated with terrorism, the collection and background monitoring of calls in the United States strikes me as a surprising example of the government doing its job. This basic cross-referencing is the exact kind of useful signals intelligence that would catch a terrorist that makes the wrong phone call. Nothing fancy, nothing complex, just a supercomputer churning through trillions and trillions of data looking for BIG RED FLAGS, like a call to Osama's cave phone. Such an effort could be integrated with the profile modeling I discuss in the previous scenario - the calls to specific terrorist numbers would simply be assigned massive weights in the predictive statistical model, exponentially increasing the "likely to be a terrorist score" to "hell yes."

That said, the challenges to this program are still significant, because the telephone numbers of terrorists with an IQ above 50 rarely remain static, and with the advent of diposable phones, a number might only last for one call. The utility of the information that such a program is looking for is strictly dependent on accurate and timely human intelligence, and in most cases the government would have to act with lightning speed on any red flag. BUT - if a phone number is identified in Osama bin Laden's rolodex, and that number is cross referenced to a relatively static phone number by a goofy terrorist - say, a NYC REIT or a Palestinian political organization - then the NSA will have possibly identified a viable fixed target for investigation and infiltration leading back to a terrorist network. In addition, some terrorists will inevitably be too stupid or lazy to consistently rotate phone numbers 100% of the time, and a supercomputer trolling the huge database could very well nail them. But the challenges remain, not the least of which is a computer powerful enough to sort such a massive amount of data with changing characteristics on a timely basis.

Utility aside, the mere existence of a program to analyze a database of domestic calls and flag events - contingent upon rational protocols that define actionable data and administrative oversight to prevent abuse - strikes me as a rather rudimentary, fundamental function of our national defense, in an age of exponentially heightening threat from bad actors with destructive weapons. This gets a big shrug from me.

(Bickering between dorkafork and Bill below the fold.)

dorkafork adds: Just some quick points:
For the record, I don't consider the program "a massive threat to life as we know it."* I've said before I'm more worried about the slippery slope. Arguments along the line of "you don't have any privacy anyway" make me worried. Also it seems to me that the government can already do many of the things Bill describes uncontroversially without a data-mining program (e.g. monitoring calls to Osama's cell phone, charity-terrorist fronts, etc.) I'd also add that though the program is fairly benign, it is not totally benign. "True, the government can de-anonymize the data if connections to terror suspects emerge..." or if they feel like it. Like Gene Healy says:

The '90s weren't that long ago. And I remember a lot of wailing and gnashing of teeth over misused FBI Files and suspicious IRS audits. Over the last four and a half years, many of the same wailers and gnashers have cheer-led the concentration of unreviewable power in the executive branch, as if George W. Bush would be the last president ever to wield that power. And now, lo and behold, there's the mistress of Travelgate warming up in the on-deck circle. Join me in a bitter chuckle.

(via Walter in Denver)

*There is a bit of hyperbole on the other side of the argument, with many calling the program "vital" to national security. The FBI would seem to disagree.

Bill Adds: dorkafork says:

I've said before I'm more worried about the slippery slope.

You know what else is a slippery slope? Giving cops guns. Also a slippery slope? Search warrants. Also a slippery slope? Fingerprinting. Also a slippery slope? Terrorist watch lists. Also a slippery slope? The hill that ran down the side of Old Mr. McVickers place back in Wabashaw, KY. Why, Jode and Jebediah and Randy and the dogs - little Ms. McFurrypants and Tootsie - and I, why we'd climb all onto trash lids and slide on down, laughin' like dagnum chuckleheads all the rootin' tootin' way. And then we'd go make cocoa. Until Randy done broke his neck like a matchstick. Damn those slippery slopes! To Hell!

dorkafork also says:

Also it seems to me that the government can already do many of the things Bill describes uncontroversially without a data-mining program (e.g. monitoring calls to Osama's cell phone, charity-terrorist fronts, etc.)

Those are simply two examples that I cited off the top of my head, and the government can only embark on that course of action with supplemental subjective intelligence. The contextual, pro-active evaluation of patterns of data by a supercomputer is different, and might identify terrorist fronts itself, in conjunction with or apart from any such evidently leading external data. There are challenges, but it's worth a shot. The fact is, private industry uses similar technology, with a striking level of rented or purchased demographic detail, all the time. Viewing our government as an outsized threat - without knowing the particulars on rational protocols and oversight - strays into alarmism, IMO.

dorkafork says "Oh yeah, well...": They're using it against reporters now. (via The Volokh Conspiracy) Weeee!

Bill says: dorkafork swallows a bucketful of spin:

ABC News does not know how the government determined who we are calling, or whether our phone records were provided to the government as part of the recently-disclosed NSA collection of domestic phone calls.

The Orwellian insinuation!

In reality, all the government needs in order to get a copy of a journalist's or suspected leaker's phone records is a simple subpeona. In the case of a spying or classified leak investigation, that's about the easiest request possible, with thin probable cause.

For that matter, all that I'd need to do in order to get a copy of phone records to aid discovery in a case that I'm involved in - as a private citizen - would be to ... obtain a subpeona to get the records.

The only controversy that I've seen regarding this practice surrounds whether the phone company has to notify the person whose records have been subpeonaed ... with one wrinkle:

AT&T Wireless now has announced that it will begin taking "all reasonable steps" to give written notice to its customers when their phone records are subpoenaed, except where government officials advise the company that informing the customer would compromise a criminal investigation.

A lot of journalists really are* imbeciles with an incredibly shallow depth of knowledge, aren't they?

* No offense intended to the journalists who are not imbecilic.

And Dorkafork? YOU JUST GOT SERVED!

Posted by Bill at May 15, 2006 08:06 AM | TrackBack (2)

Trackback Pings

TrackBack URL for this entry:
http://www.indcjournal.com/cgi-bin/mt/dafrules/tapaz.cgi/2539

Comments

As someone who does some datamining work now and then, I think the most logical use is an extension of #2.

Step 1) Given the known known/suspected terrorist phone #s, find any numbers called by 2 or more of these, or called many times. Monitor these to determine whether suspicion of terrorist activity is justified.

Step 2) With the new information gleaned, repeat Step 1.

Separately, chart the calls between suspected terrorist #s to lay out the organization's structure.

Of course, this program could be abused any of a million different ways. But so can a cop's gun.

Posted by: TallDave at May 15, 2006 01:40 PM

I'm still waiting find out who actually suffered harm in any of these activities and when the Justice department is going to procecute the leakers and their enablers.

Posted by: Bill Maron at May 15, 2006 02:30 PM

The ABC story highlights potential for abuse.

Posted by: dorkafork at May 15, 2006 03:36 PM

But the ABC story highlights how our CURRENT legal system can - theoretically - be abused.

Posted by: Bill from INDC at May 15, 2006 04:03 PM

There's a reason the government has to get a subpoena before they go traipsing through our personal records. And the government also has to have a reason to be presented before a judge. I'm well aware that our current system can - in practice not just theory - be abused.

And I got served? Bitch, please.

Posted by: dorkafork at May 15, 2006 04:31 PM

There's a reason the government has to get a subpoena before they go traipsing through our personal records.

And presumably there would be a reason, or protocol for a government official to use the overall call database to target an individual, and it probably isn't what George Bush feels like doing that day ...

Which is why the protocols are all-important.

Posted by: Bill from INDC at May 15, 2006 04:45 PM

Also a slippery slope? A standing army.

"I do not like... the omission of a bill of rights providing clearly and without the aid of sophisms for freedom of religion, freedom of the press, protection against standing armies, restriction against monopolies, the eternal and unremitting force of the habeas corpus laws, and trials by jury in all matters of fact triable by the laws of the land and not by the law of nations." --Thomas Jefferson to James Madison, 1787.

Posted by: Flea at May 15, 2006 04:58 PM

Protocols are no substitute for checks and balances. (And isn't this supposed to be a dance-off?)

Posted by: dorkafork at May 15, 2006 05:10 PM

And presumably there would be a reason, or protocol for a government official to use the overall call database to target an individual, and it probably isn't what George Bush feels like doing that day ...

Which is why the protocols are all-important.

We have a president and his legal corps that have persistently argued that presidential power allows the president to set aside laws such as torture if he/she sees the need. (Yoo’s famous crushed testicle argument). Given that the administration could set aside any protocol if they so desired, it makes the existence of such protocols moot. He/she could declare anyone a threat and start digging through the data. Just like they can declare anyone an enemy combatant and lock them up in jail without recourse. There is a fundamental lack of restraint and a giant candy jar.

Posted by: fish at May 15, 2006 05:37 PM

Bill, listen to your friend Billy Zane. He's a cool dude. He's trying to help you out.

Posted by: Hubris at May 15, 2006 06:17 PM

It's way easier as a private citizen; you don't even need a subpoena, Bill. Google "phone records" and it'll take about 3 seconds to find a company that will purchase phone records for you for a small fee. The phone companies can give our records to anyone they please as long as they aren't an agent of the government and currently there isn't a damn thing anyone can do about it.

Posted by: SeanH at May 15, 2006 06:53 PM

It's way easier as a private citizen; you don't even need a subpoena, Bill.

True enough, but legislation is underway to stop that.

Posted by: Bill from INDC at May 15, 2006 06:58 PM

The call patterns they are interested in are probably phone activity in and around various Islamic Mosques, institutions and groups. Probably with the hope of new leads.

Posted by: jpm100 at May 15, 2006 09:46 PM

Bill, you are totally ducking Dorkafork's dance-off question.

Posted by: Flea at May 16, 2006 01:09 AM

The right has always vociferously protested the government peeking into our private lives. Now, what? 9-11 cannot be used as an excuse for dragging the Constitution through the mud. If the laws are too restrictive to protect our country, then change the laws -- but don't break them.

Posted by: Vaughn D. Taylor at May 16, 2006 08:50 AM

Also a slippery slope? A standing army.

Heh.

DISBAND THE MILITARY NOW BEFORE THE LAST OF OUR CIVIL LIBERTIES ARE CRUSHED BENEATH ITS OPPRESSIVE WARBOOTS!!

Posted by: TallDave at May 16, 2006 09:26 AM

Vaughn -

First you've got to define the illegality w/o partisan inclinations, something that's currently being hashed out in the 'sphere and Congress. That said, I will admit my relative ignorance on that aspect, though anonymous data mining of calls isn't illegal; I can do it where I work in a marketing context. When the information is acted on there might arise complications, I have no idea.

fish -

While I agree that the enemy combatant lingo/permanent assignation is a likely overreach, I don't embrace the definition that methods used by our interrogators constitute "torture."

Memos written by Donald Rumsfeld on the definitions of acceptable interrogation seem relatively benign, authorizing sleep deprivation, climate control, etc., with the most questionable tactic being waterboarding. Which I'm not instinctively upset about, as it's primarily a mental trick. So ... shrug.

The administration does tend to reach for authority, however. Which is why, in the post and the preceeding comments ...

(and this is addressed to dorkafork as well)

... I specified "protocols" AND "oversight."

This oversight can be judicial, it can be legislative, or it can be undertaken by independent (from the NSA) entities within the executive that have a mandate which traditionally transcends allegiance to a particular Presidency. This is an addendum to the existence of a data mining program that may or may not be in place in any current form. Problem is, you and I don't know the detail, that I'm aware of.

Posted by: Bill from INDC at May 16, 2006 09:33 AM

with the most questionable tactic being waterboarding. Which I'm not instinctively upset about, as it's primarily a mental trick

I think those "Magic Eye" posters should also be banned, if only to avoid unduly upsetting John Cole.

Posted by: TallDave at May 16, 2006 07:01 PM

I don't embrace the definition that methods used by our interrogators constitute "torture."

I don't think the US is sending prisoners to the likes of Uzbekistan to stand in the corner or for a ride in the dunk-tank.

Posted by: fish at May 18, 2006 02:02 PM

Posted by: tester at October 14, 2006 09:22 AM

Posted by: thomson at October 14, 2006 10:44 AM

Posted by: viagra soft tabs at November 13, 2006 09:25 PM

Posted by: buy generic viagra at November 19, 2006 11:41 PM

I just don't have much to say recently. Such is life. I've basically been doing nothing. Basically nothing seems worth bothering with. Oh well.

Posted by: Sten73240 at December 26, 2006 09:26 AM

I can't be bothered with anything these days, but shrug. I just don't have anything to say recently. I haven't gotten much done recently. Nothing seems worth thinking about.

Posted by: TramadoL47124 at December 26, 2006 11:54 AM

My life's been basically bland today. More or less nothing seems worth thinking about. My mind is like an empty room. I've more or less been doing nothing to speak of. Not much on my mind recently.

Posted by: TramadoL35506 at December 26, 2006 06:48 PM

retty much nothing seems worth thinking about. My life's been completely dull , not that it matters. I've just been staying at home waiting for something to happen.

Posted by: Sten95730 at January 19, 2007 05:54 AM

gaurnlyi vneqoixua mcrsozit kjirxw qcvomsy uqifetok mistjdoh

Posted by: pndvmzg kmixg at March 2, 2007 09:52 PM

Nice resource, very interesting reading. http://s1u.net/inob

Posted by: Cellphone at April 14, 2007 02:38 AM

A musical about the witches from The Wizard of Oz breaks West End box office records, its producers say...

Posted by: Darrell Bryant at April 16, 2007 07:31 AM

Microsoft and Peter Jackson postpone the making of a film based on the Halo video game after backers pull out...

Posted by: Nicholas Storey at April 16, 2007 04:52 PM

Microsoft and Peter Jackson postpone the making of a film based on the Halo video game after backers pull out...

Posted by: Nicholas Storey at April 16, 2007 04:53 PM

Social networking site MySpace is to block users from uploading copyrighted music to its pages...

Posted by: Jeff Beckman at April 17, 2007 03:04 AM