In late 2014, an anonymous person offered to send a German journalist 11.5 million encrypted documents detailing the structure of offshore business entities created and managed by a Panamanian law firm in the world’s most notorious tax havens. The massive data set was simply too big for one reporter to comprehend. But thanks to the power of big data analytics running in the cloud, a large team of journalists started piecing it together.
This is the story of how the International Consortium of Investigative Journalists (ICIJ) worked with the Süddeutsche Zeitung (SZ), Germany’s largest daily newspaper, to analyze the data known as the Panama Papers. The ICIJ investigation (https://panamapapers.icij.org) surfaced unsavory connections between powerful politicians, business owners, banks, and offshore businesses, which the ICIJ alleges are used to cover up tax evasion and other financial crimes.
The Panama Papers has already led to the resignation of Iceland’s prime minister and the head of a Chilean-based anticorruption group. It’s also raised questions about the financial dealings of others, including Russian and Chinese leaders, Argentinean soccer stars, and Saudi Arabian royalty. More stories based on the ICIJ’s investigation are slated for the weeks to come, and next month the public will have access to ICIJ’s entire Panama Papers database.
The Panama Papers is essentially a full data dump from Mossack Fonseca, the Panama-based law firm that’s been said to be the fourth largest creator of offshore businesses in the world. The treasure trove consists of 2.6TB of data, including relational database files, emails, and various types of documents about the 215,000 offshore bank accounts and shell companies that the law firm and its predecessors created for thousands of individuals between 1977 and 2015.
Setting up the systems that would enable ICIJ journalists to pour through this massive data set was the responsibility of the ICIJ’s Data and Research Unit Editor, Mar Cabra. In an interview with Datanami, the Spanish journalist discussed the technical challenges that the Panama Papers represented, and the practical solutions that were implemented.