Among surveillance legal policy specialists, it is common to cite a set of statistics from an October 2011 opinion by Judge John Bates, then of the FISA Court, about the volume of internet communications the National Security Agency was collecting under the FISA Amendments Act (“Section 702”) warrantless surveillance program. In his opinion, declassified in August 2013, Judge Bates wrote that the NSA was collecting more than 250 million internet communications a year, of which 91 percent came from its Prism system (which collects stored e-mails from providers like Gmail) and 9 percent came from its upstream system (which collects transmitted messages from network operators like AT&T).
These numbers are wrong. This blog post will address, first, the widespread nature of this misunderstanding; second, how I came to FOIA certain documents trying to figure out whether the numbers really added up; third, what those documents show; and fourth, what I further learned in talking to an intelligence official. This is far too dense and weedy for a New York Times article, but should hopefully be of some interest to specialists.
First: This misunderstanding is widespread, showing up in court filings, articles, and books. The Washington Post cited it in one of its 2013 articles about the documents Edward Snowden sent to Bart Gellman. The Privacy and Civil Liberties Oversight Board, on pages 33-34 of its big 2014 report on 702, repeated it. I cited this statistic in my 2015 book, Power Wars. Laura Donahue refers to it on page 55 of her 2016 book, The Future of Foreign Intelligence. Tim Edgar cited it in an endnote on page 249 of his new book, Beyond Snowden, though he also cited a blog post I wrote last year raising doubts about it.
Second: I raised those doubts after reading a Medium post by Beatrice Hanssen that focused on Judges Bates’ opinion and the implications of “Multi-Communication Transactions” (where one intercepted upstream internet transaction might contain a bundle of many discrete communications). Hanssen raised the question of whether the existence of MCTs meant FISA Amendments Act upstream collection was bulkier than has been understood. I don’t agree with all of Hanssen’s analysis/claims about what an MCT is and what is happening. But her her claim seemed plausible to me that his ruling was mistakenly conflating “communications” and “transactions,” such that it was impossible for upstream to account for just 9 percent of the total 702 internet “communications” collected. That prompted me to FOIA the NSA for the other documents from the docket that led to the Bates rulings. I recently got the second big tranche from that lawsuit, which make a passing appearance in an article the New York Times published tonight.
Third: The documents show that the NSA/DOJ did tell Bates the numbers he recycled or paraphrased, but they strongly suggest that the government was misleadingly using the terms “communications” and “transactions” as interchangeable when they are not. Specifically, Bates was told (see Sept. 9, 2011 filing):
- The NSA took a mid-July 2011 snapshot of stuff added to its 702 repository during the first six months of 2011 and identified 140.97 million such Internet “communications.”
- So multiplying that by two to get a year’s worth, that means NSA was collecting more like 281 million internet … somethings (see below) a year in that time period. Bates simplified this to “more than 250 million.”
- (Complication: Actually there were more somethings collected, but about 18k had been purged in the first six months for compliance reasons like roamers/overcollection before the snapshot, with the further complication that not all 18k had been gathered in 2011.)
- Of these 140.97 million somethings in six months, 127.72 million (about 91 percent) came from Prism and 13.25 million (about 9 percent) came from upstream. So that is where that ratio came from.
- The NSA also conducted a manual review of a statistically representative sample of upstream 702 “communications” in its repository. This covered 50,440 “transactions.” Of those, 5,081 (about 10 percent) were multi-communication transactions or MCTs and 45,359 (about 90 percent) were discrete/single communications or SCTs.
- Since about 10 percent of upstream transactions were MCTs, each of which represent multiple communications, the upstream contribution to the total must be significantly more than 13.25 million in six months (or 26.5 million in a year). (Note: Hanssen’s Medium post got into some of this, too, citing a footnote in Bates’ opinion.)
- Therefore, the NSA must have been collecting more than the 250-280 million internet “communications” Bates’ opinion and the newly disclosed NSA/DOJ submission to Bates suggested, and upstream must make up a larger percentage of the total haul of 702 collection than 9 percent.
- Since we don’t know what the average number of discrete communications within an MCT is, we don’t know what the right multiplier is. We just know those numbers for total communications, and for the ratio of upstream communications collection versus Prism communications collection, must be wrong.
- UPDATE: For example, if the MCT multiplier is 10, then total annual 702 collection circa 2011 was about 305.79 million communications, of which upstream accounted for 50.35 million or 16.5% of that total. If the multiplier is 100, the total was 544.29 million, of which upstream contributed 288.85 million, representing 53.1 percent.
- Note that because of the end of “about” collection, these numbers are obsolete anyway and only of historical interest.
Fourth: After receiving and thinking about these documents, I spoke to a U.S. intelligence official about them. The official confirmed that the numbers were misleading, explaining that in that time period the NSA was using the words “communications” and “transactions” interchangeably, but after the 2011 attention to the existence of the MCT phenomenon made clear that was imprecise, it stopped doing so. The official said the 91 percent Prism to 9 percent upstream ratio was correct as far as units of stuff collected, but since sometimes one unit of stuff was many discrete communications, it’s not clear how many “communications” are in the upstream pile. The official was unaware of any NSA study that determined what the average number of communications in upstream MCT is.
But the official said there was another complexity that cut the other way. Sometimes, in upstream surveillance, a single discrete communications is chopped up and transmitted separately, and therefore intercepted as multiple transactions. In that case, rather than one transaction representing multiple communications, one communication would represent multiple transactions. So it’s twice murky.
In sum, we don’t know what we thought we knew. But we do know now that nobody should cite those numbers anymore, or at least not without a long and complicated caveat.