A couple of days ago, we saw a fresh listing on a popular hacking forum that claimed to contain data from 500 million LinkedIn profiles. After an initial evaluation of the data, we figured that the seller is just attempting to scam people into buying publicly available data that was scraped from openly available online sources. Still, many websites reported on this listing, which came right after the Facebook data breach news, and so things got out of hand quickly.
LinkedIn was compelled to issue an official announcement about it, calling the listing just a compilation of aggregated data. The platform clarifies that no data breach has occurred and that no private member account data is included in the set. Of course, scraping data still constitutes a violation of LinkedIn’s terms of service, but stopping it remains highly complicated. This has occurred again many times in the past, with the most recent one taking place only a couple of months back.
And to top it all, the Italian data protection authority (GPDP) has also announced that they’re launching an investigation against LinkedIn to figure out if there was a data breach incident that resulted in the dissemination of user data, including IDs, full names, email addresses, telephone numbers, connections to other LinkedIn profiles and those of other social media, professional titles, and other work information entered in their profiles by users.
If that tells us something is that even scammers selling bogus data sets on hacking forums have the power to cause serious trouble to online entities like LinkedIn, not just get people’s money for no good reason. Of course, we’re not saying that these compilations of data are worthless or useless.
Those who buy them can use the information for phishing, scamming, spamming, or even SIM swapping. Still, all of this data was already made publicly available by the users themselves, so it’s by no means anything new or secret.
LinkedIn and other platforms that hold troves of data have some safeguards in place to prevent bot aggregators from scraping data, but obviously, these measures aren’t as effective as they need to be. From a practical perspective, it is a complicated procedure that has both parties constantly evolving their methods.