This is a fully interactive long-form post on data mapping from a compliance perspective. Each week we will publish a new section. You can use the side navigation bar to the right of the page to quickly move between the sections.
Together, we will illustrate how data mapping can be a completely automated real-time process yielding almost endless insights and actions for the DPO and the compliance team.
The sections in the series are as follows (Sections with link have already been published).
- How to let automation drive your data mapping process.
- Why data sampling isn’t good enough.
- The benefits of data mapping bridging data silos.
- How real-time data mapping feedback improves compliance.
- How vendor changes impact your data mapping.
- Linking your internal data mapping to external communication.
- The people angle to data mapping – access and ownership
- Data mapping from different angles – sources and segments
- Vizualize your data mapping
- Tying it all together – best in class data mapping.
1. How to let automation drive your data mapping process.
One cause of the most common problems we encounter when speaking with DPOs about data mapping is the manual nature of the process.
A typical data mapping process is a yearly event where the data compliance team interviews departments, send surveys and go through data samples. The result is a number of word documents or excel sheets containing an overview of the processed data.
This means a messy set-up, that is highly manual and the time taken for knowledge to disseminate throughout the company is yearly (or longer).
Such a process is not optimal. First, the end result is not easily searchable or easy to query. It is time-consuming and quickly out of date. But worst, it is not guaranteed to be correct.
We want to change this. Our promise with the Wult platform is to reduce manual data mapping tasks by up to 90%. That knowledge gap is also too long. We believe with our platform we can reduce it by over six months.
How does our platform do this? With a bottom-up approach.
We build the data mapping process from the actual data
The Wult data mapping feature takes a bottom-up approach starting with the actual data in the data silos. We connect to your data silos and use this to build out data mapping.
A silo can be anything that holds data and support unstructured data (think Google Drive, Dropbox etc.), API integration of structured data down to more infrastructure-heavy systems like databases. We know that the average company uses multiple silos, and mapping these can take a lot of time.
By using native integrations into each data source, our platform creates automated data mapping consisting of:
- Overview of all the data sources and datasets within.
- Data scanning to understand the data types, data source overlap and more.
- Segment identification to allow separation between customers, employees etc.
- Geo-spatial tagging of data.
This approach ensures the highest quality of data understanding. You can trust that all data in each data lake has been indexed and scanned, and the structured format of the data map allows you to build upon it.
This saves time, and reduces the knowledge gap.
The following sections will cover some of the insights and applications you can build on top of this powerful data map.
2. Why data sampling isn’t good enough.
While many companies will focus solely on how data schemas are defined as part of their data mapping, more and more companies will do analytics on data samples to further understand the data.
When mapping data, most companies will focus mainly on defining data schemas.
As part of a more robust mapping strategy, companies will also do analytics on data samples. These analytics are done to gain a better understanding of their data.
In this part of our data mapping series, we will argue that whilst this is an excellent first step, data analysis requires high amounts of coverage to full compliance.
Why do we analyze data sets?
One of the main goals of data mapping is to map out which kinds of sensitive data are kept by an organization.
In today’s world, companies must understand which data types they hold and what they are liable to do with these kinds of data under the relevant data protection legislation.
It’s also essential to analyze datasets to understand data quantity and inform data retention. But more on that later.
The problem with data analysis today
Whilst data analysis is a great step towards a better understanding of data, and thus better data compliance, in its current form, there are some issues.
The problem that we see most with data mapping analysis is the coverage of the samples. Companies are using smaller data samples that they believe to be representative.
The great coverage question
For sampling to be an efficient tool, you have to ensure enough coverage for the sample to represent what you are trying to measure.
The more variable parameters you investigate, the larger the sample size you typically need.
You, therefore, end in a situation with a tradeoff between the insights you can get and the amount of analysis you put in.
At Wult, we’re trying to index data better to provide a complete insight into data structure, sensitive data and quantity.
The Wult data mapping platform creates a privacy index on top of all your data, so you can get insights into all your data for any parameter you might choose.
With this, you can answer questions like:
- How does my customer data overlap across data sources? Answered by understanding exactly which emails or other identifiers belonging to customers are present in any given data silo.
- How many identifiers am I holding with a given segment?
- Quantify segments across any parameter. This can help you understand if you fall under new regulations such as the Virginia Consumer Data Protection Act (“VCDPA”) that applies if you process data of more than 100,000 consumers in Virginia.
- Which types of data are stored together and to which extent?
On top of this, the entire mapping process is fully automated, so the DPO and data team will never spend any time handling manual processes.
And the system is reading data in real-time, ensuring your data mapping is always up to date. This reduces the chance of missed fields, giving you a complete understanding of your data siloes.
Data analysis has significantly developed in recent times. But to truly build a compliance-first ecosystem, companies need automated systems that deliver representative insights into company data structure.
3. The benefits of data mapping bridging data silos.
When mapping data in an organization, it can be very easy to get stuck in a certain way of looking at things.
Once you have chosen your angle, it can be challenging to adapt and change when new data is added or data changes significantly.
You can be left with an incredible amount of work to remap the data with a different perspective or goal.
This is another reason that we built our data mapping platform.
What do you mean by data silo?
A data silo is usually a data set containing a single data source with the same characteristics. A company or organization will usually have multiple data siloes, with different kinds of data with different characteristics, legal implications and sensitive fields.
A data silo doesn’t contain all of a company’s data or the dimensions needed to do effective data mapping. That’s why we build our platform to work with segments.
What is a segment in data mapping?
A segment is a new way of looking at your company’s data. Rather than looking at your data through the lens of a silo, segments allow you to look at a combination of siloed data that are grouped differently.
An example of a segment is customers. You may have data in different siloes, but our platform will let you combine these siloes to create a clear overview of multiple datasets.
Why does segmentation help to improve data mapping?
Clear overview + understanding your segment
By segmenting your data by user, for example. You can see a clear overview of what types of data you have in different silos. In this example, you might find that you are collecting sensitive fields in one dataset that you weren’t aware of.
Often it isn’t easy to build a complete understanding of the different types of data being collected in a segment.
The number of entries in a segment isn’t quantifiable. It’s an unknown.
Let’s look at an example to see how this can affect a company.
Companies usually have a CRM that includes name, email and phone number. Combining this with a finance system would likely add banking information and address into a separate silo.
And by combining this with your marketing tool, you might also know a customer’s gender and age bracket. Again this is siloed.
If you look at that data silo per silo, it is unclear how much data you are storing, which can be a considerable risk.
This same issue extends to data volume. Consider you have 60,000 customers in one system, 25,000 in another, and 10,000 in the last. How many do you have? Somewhere between 60,000 and 95,000.
Only by creating an identity graph of all your siloed data will you know for sure and can act accordingly.
The same thing applies to retention. A segmented view of your data allows you to understand where you might be collecting more data than you need to?
Companies shy away from implementing retention policies because it can be daunting to map data across siloes accurately. It becomes easier to keep as much data as possible.
But this can lead to several problems, especially as regulations shift and employees change.
Segments make retention easier, as implementing these policies across siloed datasets is challenging. These are usually specific to a silo, even where retention policies exist, making full retention impossible.
Unconnected, siloed data is not a good place in today’s world. As compliance grows, businesses need a universal view of their data to comply with quickly changing regulations.
Alongside this, the world of data is now an international one. But the world currently lacks universal data regulation. This means different processing laws in different regions.
For example, you may have siloed data stored in both the EU and the US. In a customer segment, you may combine these, meaning that you are moving data from the EU to the US.
This means that data for EU citizens is hosted in the US, which can create regulatory problems. Our platform can help you to identify these issues and alert you to the need for a transfer impact assessment. Something very different from disjointed siloed datasets.
Our platform allows you to understand which type of data in which silos are split or duplicated across different regions. This specifically helps with the problem of cross international data regulation.
Regular alerts mean proactive data mapping. And finally, segmented data mapping means that governance platforms such as the Wult platform can alert you of changes. Moreover, these changes can be segment-specific.
For example, alerting the DPO when a new type of PII is found.
Or giving instant updates when data is stored in a new country.
Segments can open up siloed data. It’s an approach that will help improve the data mapping process, enabling automation, better retention, and improved compliance.
Get in touch to learn how Wult can help with Data Mapping