Few of us would doubt the value in the proper use of big data collection. Healthcare is perhaps most important, but better law enforcement and product safety improvement come in close seconds. Then there’s the intelligence value of collected data to find and neutralize growing threats to our nation.

So how can we take advantage of the value of big data without suffering the distrust and damage caused by its unauthorized collection and use?

Data, Data, Who’s Got Your Data?

In today’s connected world, you might say anyone and everyone. We’ve all heard that nothing is sure but death and taxes .... You can add the collection and use of big data to that list.

Everywhere we look, corporations and government agencies are relying on big data stores to pursue their relationships with you. If you shop online (and who doesn’t, these days?) your inbox and smartphone are full of marketing messages from retailers that seem to know what you have been buying, from whom and how long ago. Even your health records, driving records and a host of other things about you are being digitized and kept for analysis. In a culture that amasses big data that way, the deluge we are seeing is unlikely to abate anytime soon.

It’s become virtually impossible to know for sure who has your data or what are they doing with it. This is especially true as the “data sharing” phenomenon has grown – organizations collecting data and then sharing it with other organizations in return for their data. Now, even organizations you may trust with your information often don’t retain it and usually don’t control how it is used once they share it.

And with the rise of a highly sophisticated global data theft industry, even big data stores held by organizations with no intent to abuse or share them are vulnerable to unauthorized use – often ending up back in the marketplace as the criminals “fence” them back into legal channels.

With so many ways your data is being collected and with so many groups collecting it, or stealing it from someone who has, everything we do with any connection to the Internet is probably on someone’s digital shelf being used for … who knows what?

Big data can seem — and is often touted as — an unmixed blessing, taking us toward a brave new world of information transparency. But, like the “man behind the curtain” in the Wizard of Oz, there is a side to the big data phenomenon that we are meant to “pay no attention to.”

So, our question: How can the collectors of big data guarantee that our information won’t be used in ways we don’t approve of?

The answer: They can’t.

Fighting an Old War

All this isn’t to say we aren’t doing anything, but what we’re doing is way behind the times.

Several years ago, as technology began its rapid rise and data collection grew with it, privacy concerns (at a much lower level of jeopardy) drove the industry to adopt several protective techniques in an attempt to assure people that their data would be safe and used only for purposes they were OK with:

  1. Purpose limitation, the careful restriction of collected data to only those elements critical to its intended use
  2. Data minimization, the collection of only that data absolutely needed to perform the intended functions
  3. Anonymization (a.k.a. “de-identification”), separating personal information from the statistical data needed for analysis
  4. Software and hardware barriers to prevent data repository compromise

These worked (sort of) for a while, often creating more impression than substance of safety. But, like the aphorism about generals always fighting the previous war, while the growth of technology has rendered these techniques less and less effective, industry continues to depend on them to deal with today’s new world of big data.

The use of big data by legitimate firms for legal uses can be controlled to some extent by the use of “opt-out and opt-in” relationships between firms and their clients: Unless you give me permission to use or share your data, I agree not to do so. We see this on a growing number of websites.

But the use of data by thieves and their unscrupulous clients falls outside this safeguard and can be controlled only through prevention of the theft in the first place, a goal that has proved elusive so far. As long as industry and individuals are obsessed with connecting everything to the Internet, things are likely to stay that way; the thieves are often smarter than the defenders, and are usually located beyond our ability to prosecute them.

Ethics? It depends…

If we’re hoping for ethical guidelines to guarantee our privacy, we may be kidding ourselves. Putatively “ethical” organizations may not feel bound to protect us if the value they can gain from the data is sufficient. Consider this excerpt from the conclusion of a Stanford Law Review Article about big data and privacy:

We call for the development of a model where the benefits of data for businesses and researchers are balanced against individual privacy rights. Such a model would help determine whether processing can be justified based on legitimate business interest or only subject to individual consent, and whether consent must be structured as opt-in or opt-out. [emphasis added]

Unless I’m missing something, the call for balancing benefits against individual privacy rights is a nice way of saying that if the value of the data is great enough, privacy concerns don’t matter. It seems that where big data is concerned, ethics have considerable flexibility.

In fact, the concept of “situation ethics” has come to dominate much academic discourse on the subject and, as we see from the citation above, has come to mean: “Do what you want; just be prepared to justify it on some basis that makes sense to you.”

Making matters even more complicated, ethics in the big data space has two components:

  • The ethics of commission, as we’ve discussed, deals with how organizations collect and use data. It’s here that we hope the collectors will be good stewards of their haul, refraining from uses they know or suspect the collectees wouldn’t agree to. Unfortunately, the rise of flexible ethics makes reliance on this component murky at best.
  • The ethics of omission doesn’t come up much in ethics discussions but is equally important. In today’s world, virtually any data store can be penetrated and compromised by criminals. Knowing this, the collectors have an ethical responsibility to do what is required to protect that data from theft. As we have seen recently, many of these organizations, commercial and government alike, have failed to do this, opening their data stores to compromise and unwittingly acting as middle-men for the data theft industry.

Any organization intending to collect and keep massive data stores, if it is to be truly ethical, must address both components of the ethics process: Use the data only in an ethical manner, and ensure that it doesn’t fall into the hands of those who won’t.

Recognition of this multi-part ethical process will increase the cost of big data, perhaps making some collection less than justified even under the flexible ethics described above. But that recognition must be part of the equation if we are to reap the benefits of big data.

Convenience… and Its Cost

Given that our penchant for convenience has become a major chink in the armor of big data protection, it may be time to begin getting the public accustomed to a bit less of it. We see the early stages of this in the credit card industry as users are often now asked to enter their ZIP codes during transactions. And the rise of the smartcard, containing an integrated circuit impervious to scanning, enables the point-of-sale software to ask several questions to verify the user’s authenticity.

All of this, and many more small steps, can make at least the initial transaction world more secure at a nominal cost in convenience, and this evolution can perhaps be translated into users’ willingness to accept more restricted access to interactive data across their retail and other transactions.

Playing for Keeps, Whether We Like It or Not

We are in a game we didn’t design, with rules made by people and groups we neither know nor trust, and with our futures on the line. The final answers are still unknowable, but there is one thing we can count on: The game will continue, and if we all don’t get to work solving our data collection, privacy and misuse problems, those futures won’t be nearly as bright as we might hope for.

Creative Commons Creative Commons Attribution-Share Alike 2.0 Generic License Title image by  dionhinchcliffe