Dark data is an all-encompassing descriptor for the unorganized, untagged, and otherwise unused data within an organization’s data estate. It can include user records containing personally identifiable information (PII), sensitive business documents, and other forms of information generated during the normal course of business. In most cases, organizations have four-times more dark data than they do known, structured data. This gap represents missed opportunities to leverage data for organizational goals, as well as increased risks in the form of data breaches, noncompliance with data privacy regulations and their subsequent reputational and financial impacts.
Here’s what you need to know about dark data, not only to make the most of the information your organization collects, but also to ensure your data is properly identified, classified and secured.
Understanding dark data
The data lifecycle represents three primary stages of data kept within an organization: data-at-rest, data-in-motion, and data-in-use.
- Data-at-rest refers to data stored throughout an organization’s systems, including databases, files stored in the cloud, and materials on employee endpoint devices.
- Data-in-motion refers to data movement from one location to another, including between systems, endpoints, or any other instance of data moving from one location to another.
- Data-in-use refers to data currently being accessed by either an authorized user or application.
Dark data falls under the category of data-at-rest. This information exists in its unused state, most often, without any kind of structure.
Dark data and data-at-rest
Stored dark data represents both unrealized opportunities and unaddressed risk for an organization. By its definition, dark data cannot be used by an organization, because the organization doesn’t know what information they actually have. Data is only valuable if it is actionable in pursuit of organizational goals (or misused by bad actors). In order to be actionable, data must first be identified and understood.
Rather than attempting to classify data, many organizations simply choose to lock down their data. However, many companies don’t even take this step, and leave themselves open to liability as a result. Cybercrime is lucrative, and stores of unprotected data-at-rest represent an ideal target and low-hanging fruit for cybercriminals.
Why is dark data so prevalent?
Back in 1967, a one-megabyte hard drive cost $1 million. Half a century later, that same amount of storage costs two cents. When storage space was at a premium, it was vital for organizations to regularly review data, keep what was needed, and discard the rest. As time passed and Moore’s Law kicked into overdrive, storage went from expensive to affordable to dirt cheap, meaning it was no longer necessary to only keep data deemed important. It became possible to store data just in case.
As data storage became cheaper, the Internet of Things began to take off. Data was no longer just being created by computers. Instead, phones, cars, and even internet-connected appliances began creating data all day, every day to the tune of 2.5 quintillion bytes of data per day. Explained another way, 90 percent of all data created in human history has been created in the last two years—90% of it unstructured and dark
Dark data best practices
The amount of data being created every day makes the task of managing data seem insurmountable. At the same time, the risks of not properly securing data are too high to ignore. Given this, what’s an organization to do?
Follow best practices.
Find your data wherever it lives
Before you can do anything with the data in your organization, you must first understand what you have through proper data discovery. Organizations that have gone through the data discovery process open up the potential for greater clarity of what information exists under their control, including any sensitive data they may possess.
Accurately classify your data for better understanding
Once data has been collected, it can be properly organized by context, sensitivity, or any other other metric a company requires. This data classification process allows for rapid data retrieval not only in day-to-day operations, but also in the event of a scenario such as a data subject access request (DSAR).
Practice proper data hygiene
By quickly and accurately tagging data, your organization can manage data more effectively throughout the data remediation process. This process brings data out of the dark, provides protection for sensitive data, and brings your organization into compliance with any applicable laws and regulations. Additionally, your staff can work more productively with better insight into the information they’re looking for.
See how five CISOs manage dark data with Spirion technology
The best way to work through sensitive dark data best practices is with a customizable solution tailored to your organization’s needs. The Spirion Governance Suite can provide the flexibility you need while delivering 98.5% accuracy. The result is a protective and proactive approach to security that allows you to direct more resources to organizational goals.