GDPR is an emotional acronym. It’s polarizing. First hailed as “the end to the free-for-all internet as we know it,” it was supposed to allow customers to take back their data from the clutches of big tech boogeymen. But a little over a year in, users browse the web as they always did. They have more pop-ups to deal with, but hey, it’s the price of doing business.

CBInsights graphic on increased focus on cybersecurity in corporations between 2014 and 2018

Data security is in the forefront of executive’s minds like never before. But for big business and regulators, GPPR has been a headache. Companies using consumer data face high fines for non-compliance: up to 4% of revenue, which could be millions for the largest tech companies. Yet there are still so many questions unanswered. Regulatory bodies charged with overseeing compliance likely don’t have the resources to oversee properly. Regulators and large companies alike feel frustration at the lack of clarity around requirements, implementation and potential consequences.

New regulation is often unclear, and this case is no different. Data privacy feels like a moving target. GDPR is the biggest and strictest piece of legislation out there today, but there will be others. California’s CCPA (modeled after GDPR and going into effect Jan. 1, 2020) is likely the first of many in the US market, and with Silicon Valley at the heart of the golden state, enterprises here in the United States must stop and take notice. The time to act is now, and an inch of prevention is worth a pound of cure.

What Do I Have to Do to Be GDPR/CCPA Compliant?

In plain english, GDPR/CCPA compliance hinges on your ability to identify, locate, extract and use/move/delete personally identifiable information (PII). You won’t have a problem with compliance if you can find and manipulate PII wherever it’s hiding — but we can almost guarantee you can’t.

You may scoff at this. It’s easy to find social security numbers, phone numbers, or credit card numbers for example — they’re located in a central database. If they’re not, the pattern is easily recognizable using your existing IT. No problem.

pie chart of all the different varieties of personally identifiable information

Image via Online Website Security

The problem lies in your company’s unstructured data. Memos, emails, PDFs and powerpoint presentations: if you look through your own files, it’s likely you’ll find more information stored in these inaccessible formats than in neat tables. IDC estimates that by 2025, 80% of enterprise data will be unstructured. New data sources are being created all the time, and per GDPR, you’re on the hook for controlling PII no matter where it comes from.

You were right before. Finding social security numbers in a table IS easy. It’s even relatively easy in unstructured data sources like PDFs. But what about PII that doesn’t conform to a pattern? How will you know whether the city mentioned in an email to customer service is your client’s home (which should be treated as PII) or a passing reference to a public library (which is public domain)?

What about unquantified PII like medical symptoms or number of kids? It’s easy for humans to pick out this information, but training a computer to do it is much harder. Luckily, a special branch of AI can solve this problem without rules-based brute-force training that haunts your IT guy’s dreams.

Enter natural language search.

Related Article: Evaluating a Web CMS in the Face of $5 Billion Fines

Natural Language Search Can Get You Out of the Weeds

Natural language search is a specialized application of AI uniquely designed to unlock insight from unstructured, free-flowing text. It works differently than other AI techniques like deep learning, which identify patterns after analyzing vast quantities of training data. NLS takes unstructured data and creates a “word mesh” similar to how we create a mindmap to connect concepts related to a big idea.

Because of this, NLS can understand context — which means it will return the same answer, regardless of user phrasing.

What NLS Means for GDPR/CCPA Compliance

NLS is a powerful enterprise tool anywhere natural language is stored, but it’s a match made in heaven for the unique combination of vague-yet-strict guidelines for GDPR and CCPA. Here’s why.

Learning Opportunities

1. It’s context-aware enough to get around the patternless PII problem

An NLS solution would be able to correlate “tightness in chest,” “heartburn,” “short of breath” and “lightheaded” with heart disease. It would be able to connect Google searches for a tandem stroller or a twin bassinet and guess that the user has (or will soon have) kids. Although these things aren’t strictly considered PII, flagging such information will point you to PII that you may miss otherwise — which is unacceptable under GDPR/CCPA. If you let anything slip, you’re liable.

2. Its utility isn’t limited to specific formats

Data is growing exponentially, and new sources are created all the time. Enterprises have barely had the chance to wrangle social feeds en masse, and now they’re tasked with processing streaming data from IoT devices and wearables. Luckily, as long as human language is involved, NLS can help. Connect a million data sources or just one or two — the words on the page are all the same to the NLS engine, and every user interaction makes the AI better.

3. Customer data is safer than ever without the same potential for 'human error'

Hackers are no match for a hapless employee. A 2018 analysis of Radar Metadata from 2016 and 2017 found that 84% of data breaches were either unintentional or inadvertent. An email sent to the wrong recipient, a bill sent to the wrong customer — these everyday mistakes cost companies millions and puts all of our data at risk.

nature of cybersecurity breaches

Image via The Privacy Advisor

It’s common sense that automating this process should remove opportunity for error and lead to better data security. But until now, processes like redaction couldn’t be effectively automated. Traditional IT tools rely on patterns, and struggle to adapt to new data streams or entities. Many big companies accepted manual redaction as the only answer, but this takes time, and adds cost and risk. Outsourcing redaction takes care of the cost, but comes with different risks.

NLS-based solutions adapt well to new text-based data sources, and they’re dependable. An incoming document is scanned for sensitive information dependent on the use case, any sensitive information is identified, flagged and verified, and returned without sensitive information. Because NLS makes “decisions” based on context and user inputs, it’s completely auditable — ensuring full transparency as you plan for the future, GDPR and beyond.

Related Article: Why California's New Privacy Law Signals a Major Shift in the Privacy Landscape

Looking Ahead: GDPR Is Just the First Step

It’s time to put the pedal to the metal where data security is concerned. Tech like NLS can help you get a handle on your unstructured data and get GDPR compliant without bankrupting you or alienating your customers. But this is short-term thinking.

You should put precautions in place to prevent employee mistakes from causing breaches, but that’s just setting an even playing field. At the same time, the bad guys are getting savvier — attack paths are getting shorter as hackers exploit incredible computing power and advanced cybersecurity weapons proliferate — AI included.

All we can say is, it’s time to fight fire with fire.