In a stunning announcement that many business professionals should be tuning into, LinkedIn made public, through a court filing earlier this month, that bad bots have been actively scraping profile data from its site for almost a year.
According to the official complaint, filed in the Northern California U.S. District Court, “during periods of time since December 2015, and to this day, unknown persons and/or entities employing various automated software programs (often referred to as ‘bots’) have extracted and copied data from many LinkedIn pages.”
LinkedIn suffers numerous attacks from content scraping botnets over the course of a year. The amount of data is undisclosed, but the implications of this are huge — as LinkedIn itself details in the court filing, if the bot activity were to continue, the company “will suffer ongoing and irreparable harm to its consumer goodwill and trust, which [it] has worked hard for years to earn and maintain.”
Why Scrape LinkedIn?
Because LinkedIn did not disclose what the scrapers were after in the court filing, we can only speculate. Looking into the complaint, LinkedIn’s representatives contend terms of services violations (LinkedIn prohibits the use of automation on its site) and fraud.
This Quora post makes it clear that web scrapers have been active on LinkedIn for a while. The practice is unsavory, but falls within a grey area that unscrupulous organizations are willing to exploit.
Scrapers generate value by stealing content (from websites or APIs) that would otherwise require resources, like time and money, to acquire. Third-party web scraping business can build platforms, with far lower overhead costs, on top of LinkedIn data and offer competitive products to LinkedIn’s business suite at lower rates.
More threatening actors, like cyber thieves and nation state operatives, can use LinkedIn data to profile potential targets for social engineering or spear phishing attacks.
Houston, We Have a Problem
While LinkedIn’s content scraping problem may come as news to some, the bad bot problem is internet wide – from online publishers, to e-commerce marketplaces, online retail, travel, real estate, and evidently, social networking.
In travel, bots can be used to scrape flight information from licensed partner sites like Kayak.com, Travelocity, etc. in order to avoid affiliate contracts and fees. This enables competitive sites to be stood up, with lower overhead costs to the site owners.
The problem in real estate is similar. Third-party real estate listing sites must pull data from centralized sources for a fee. Scrapers steal content from these third-party sites and repost using the same business model to monetize – either by selling leads or posting ads.
These businesses are increasingly turning to existing security technologies to combat this problem, but evidently even massive web properties, like LinkedIn, remain vulnerable.
Is Web Scraping Illegal?
LinkedIn contends that fraud was committed, its terms of services violated, and that it made reasonable efforts (with security measures to prevent automation) to protect itself. All that being said, the illegality of web scraping is still up for debate.
If the stolen data was made public by the business, how can the legal system intervene? Where is the line between protection and preferential treatment? What legal claims does LinkedIn have to data that belongs to its members? Modern society is still grappling with these questions with few, if any, answers forthcoming.
While firms like LinkedIn are pursuing legal recourse, many businesses don’t have the time or inclination to do so. Even if they did, LinkedIn will have the same uphill battle many law enforcement agencies have when fighting cyber threat actors across jurisdictions and across international borders. It will be interesting to see if any justice will be brought to the perpetrators or if LinkedIn will wind up investing tons of money and resources without any resolution.
Bat Bots Are Getting Worse
Our own 2016 Bad Bot Landscape Report found that the frequency of bot attacks may be trending down — that’s the good news.
Bots don’t need sleep, they don’t rest, and they are only increasing in sophistication – there is simply too much money being made.
The LinkedIn incident is a powerful reminder that content scraping has become a massive problem online. Businesses must recognize that if they are making money on what is publicly available via their web apps (and even behind secure login pages), someone else is using bots to steal that information and the corresponding revenue along with it.
Learn how you can join our contributor community.