There is some new data on the war to protect enterprise data. Neither of these data points are they types of "big data" that you are used to, but that does not mean they are less important. The new data refers to a significantly important set of user-behavior statistics that is set apart from the fashionable analytic statistics driving the current data-scientist hype. In fact, most analytic departments tend to try and remove this specific set of usage data from any and all reports they generate. This hated chunk of analytic data refers to the "non-human traffic" create by "scrapers" who come to web pages not for the feature-functionality of a site, but for the gold within the site (i.e., the data and content).

Big Data Part 1

The Jury Is In. Many of us knew it all along. Science has caught up to the enlightened few and shown that reasonably regulated marketplaces are the best way to combat illegal piracy of data and content. I've chosen not to say "the only way", because you can try the old fashioned ways of protection like detecting automated scripts, content honey-pots and digital watermarking. The funny thing about these other ways is this -- they don't actually work. That's right folks! Newton's third law of motion wins again. "When one body exerts a force on a second body, the second body simultaneously exerts a force equal in magnitude and opposite in direction to that of the first body."

The corporate IT departments, security teams, lawyers and external security vendors have met their match; Scrapers. Scrapers are people who make a living writing automated scripts that go to websites and scrape data and content out of the HTML and store it for use in another website. The scrapers are so brazen that they openly sell their services out in the open marketplace -- because after all, writing a script is not illegal, using it in violation of a website's terms of service is; and that's the problem of the guy who hired the scraper to harvest the data. Scrapers are so good at what they do, they are better than LeBron James. You not only can't stop them, you can't even hope to contain them because of basic supply demand behaviors. Attempts to suppress the supply without meaningful attempts to satisfy or stem demand create black markets because the profit incentive is too good to refuse.

  • In a recent study commissioned by Spotify, efforts to reduce piracy through controlling distribution (called 'artist holdout') ended up having the reverse of the intended effect. In an admittedly small sample size, Spotify notes that the artists who engage in 'artist holdouts' sold 1 song per 1 song illegally downloaded. The artists who released on Spotify at the same time as iTunes sold 4 tracks for every one illegally downloaded.

There is hope. The big data shows us that there is another way:

  • In Norway, piracy of multiple media types has plummeted. Per global market research firm, Ipsos, piracy of music has dropped by 83% in under 4 years and piracy of movies and TV shows dropped 50%. While the exact cause is still a subject of debate between those who want to wage an intellectual war on the value of copyrights in our society, there is a curious correlation (backed up by survey data of downloaders) between the drop in piracy and the presence of legitimate and reasonable offers for purchased downloading.

Big Data Part 2

If you are in the web business and think this is not your problem, you would be wrong. If you are creating data, features or content, this is your problem. If you are marketing data, features or content, this is your problem. The data, features and content on your site are gold and people are lining up to make money off them. Some would have you believe that legal mechanisms and other deterrents are the "right" way to handle this because stealing the intellectual property of others is "wrong" and should be combatted.

Others, claim that intellectual property and copyright law is a "wrong" concept that needs to be abolished. A third group chooses to ask and then answer a different question -- "How much would you pay to stop 90% of the piracy of your data and content in 2 years or less?" Before you answer that, let me ask a different question. "How about you allow me to pay you for the privilege of stopping it?"

Learning Opportunities

This is the business model of legitimate access to copyrighted material (e.g., netflix, amazon, hulu, iTunes, Spotify, rdio, apigee, mashery, etc.). It took the music industry so long to catch on to this that the value of the music file (e.g., the price that people are willing to pay) plummeted in the social consciousness. The film and television industry leaned something from the fall the music industry and responded before things got out of hand.

Protection Vs. Leverage

Some of the data and content providers on the internet have learned from this and have opened public APIs with paid options for partners who want to leverage their content. Others are trying to hold on and protect their rights. There is no telling when the tipping point will be just yet, but the data would suggest that one is coming.

This whole debate seems out of control when you remember that the point to creating and marketing the data and the content was to serve a purpose and to make money and now the IT security and legal departments want everyone to believe that rather than profiting from the controlled and governed licensing of data and content in a way that furthers the company's purpose, the "right" thing to do is to spend time, money and effort to inhibit the usage and proliferation of the data that drives the company's purpose.

The choice is yours: Would you rather profit from your data or pay to protect it?