How have users’ personal data been filtered?

Facebook has suffered a personal data leak of the most important in its history. The unknowns are many, and the company’s explanations have been scarce and poor if we take into account the magnitude of the problem. Email addresses, IDs, comments, likes and even cell phone numbers are going around the net.

In total there are more than 500 million accounts that, according to the company, were not obtained by hacking their systems, but by removing them from their platform before September 2019. In a post on the company’s blog, Facebook explains that the extraction of personal data was done through scraping. Scraping is a tactic by which, using automated software, public information is extracted from the Internet.

The social network ensures that they already patched said vulnerability and that the hole that allowed us to extract this data in 2019 no longer exists. And, as explained by Mike Clark, Product Management Director of Facebook, the personal data was extracted using the version of the contact importer that was active in 2019:

“When we realized how malicious attackers were using this feature in 2019, we made changes to the contact importer. In this case, we updated it to prevent malicious attackers from using software to mimic our application and uploading a large number of phone numbers to see which ones matched Facebook users. “

Post on the Facebook blog about data leaking.

The provenance of the massive data filtering is not entirely clear

However, despite the company’s explanations, it is not entirely clear that this was the (only) cause. According to Wired, there are different sets of personal data of users hanging around the net. This medium points out that the 533 million records are part of a completely different set of data from the one that the attackers created after abusing the aforementioned tool for importing contacts from the Facebook address book.

While everyone agrees that Facebook fixed the vulnerability in August 2019, unclear how many times the bug was exploited before then. That is, it is unknown if the 530 million accounts leaked over the weekend correspond to a single access or are the result of several scrappers.

To further complicate matters, the Irish Data Protection Commission issued a statement pointing out that the leaked personal data set includes aggregated information from various massive scraps:

“The above data sets were leaked in 2019 and 2018 and are related to a large-scale scrapping that, at the time Facebook reported, occurred between June 2017 and April 2018, when Facebook closed a vulnerability in its functionality of phone search. Because the scraping took place before the application of the GDPR, Facebook decided not to report this as a personal data breach.

The recently released dataset appears to comprise the original 2018 dataset (pre-GDPR) and combined with additional records, which may be from a later period. ‘

Irish DPC statement

A post-2019 vulnerability dataset?

Facebook personal data leak

Facebook, responding to a request from the Irish Data Protection Commission, confirms that the dataset appears to have been collected by third parties, and that it comes from various sources. Something that leaves more questions than answers. Especially if we take into account that the Irish CPD points out that they may be from a later period.

Are the data sets only from the 2018 and 2019 massive scrapes using the contact tools? At the moment Facebook suggest that the answer to that question requires further investigation. And therefore, it is not entirely clear:

(…) The data in question appears to have been collected by third parties and potentially comes from multiple sources. Therefore, it requires a thorough investigation to establish its provenance with a level of confidence sufficient to provide your Office and our users with additional information.

Facebook Response to Irish DPC

While all these issues are cleared up, you can check if your personal data is among those leaked.