Thanks to increasing data privacy regulations, such as the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), data classification is receiving renewed interest from organizations around the world. A key driver for this is the requirement that organizations must classify sensitive personal information from European Union or California citizens.
What used to be a simple process that involved applying data to a few buckets to streamline data management is evolving into a much more sophisticated process to meet organizations’ intensifying data privacy and security demands.
A recent Gartner report on using classification to improve unstructured data security highlights the benefits of data classification within the broader context of security and compliance. The report also identifies the key criteria and limitations of today’s classification tools and provides expert analysis and guidance. In this blog, we’ll discuss what we consider to be the key takeaways from Gartner’s findings.
The role of classification in the data life cycle
Spirion has long advocated that data discovery and data classification go hand in hand: one cannot exist without the other. Data discovery is the process of collecting data from databases and silos and consolidating it into a single source that can be easily and instantly accessed.
Once you have located all of your data, the application of labels, tags, and visual marker classifications help both humans and computers determine its sensitivity and treat it consistently. Such data classifications ensure that only appropriate parties can access sensitive data as it moves through the organization. They also support information sharing with other data protection controls.
The Gartner report stresses, “Understanding the sensitive data estate is a prerequisite for effective security and data privacy compliance. Data loss prevention (DLP) is not sufficient as privacy is more than a data loss problem. Security and risk management technical professionals should use data classification to support these requirements.” They further add, “Data classification capabilities are therefore not only useful; they are often necessary to achieve compliance and make data-centric security controls effective.”
The business cases for classification
Why classify? By itself, data classification may not be all that useful, but in concert with a holistic data life cycle, it becomes a key enabler and necessary component for effective data governance and compliance programs.
Gartner explains, “Classification is used to provide insight and either deliver or support control activities. In doing this, it supports a variety of business drivers for data-focused security. The most commonly seen of these drivers that are grounded in security within Gartner inquiry is privacy, which forms the bulk of data security risk for many clients.”
Some of the most common business cases for data classification include:
Classification Policy: The prerequisite first step
There is a critical need to classify data according to organization policy. While there are many schemas that organizations can use for classifying data, there are two key categories for security:
- Confidential – Sensitive data that could negatively impact operations if compromised, including harming the company, its customers, partners, or employees. Examples include vendor contracts, employee reviews and salaries, and customer information.
- Restricted – Highly sensitive corporate data that could put the organization at financial, legal, regulatory, and reputation risk if compromised. Examples include customers’ Personally Identifiable Information (PII), Protected Health Information (PHI), and credit card information.
Gartner emphasizes: “Even if you do nothing else, put an information classification policy in place. If it demands that users treat and mark data in a certain way, then you have provided the foundation for user education, technical control and compliance.” They add, “Confidentiality labels will form part of an information classification policy, which should be accompanied by a data handling policy, which provides the logical basis for security standards and control requirements. Without such a policy, classification programs are likely to fail as there is no common understanding, classification labels can proliferate, and misclassification will be more frequent.”
Data classification solution landscape
Although there are few sole data classification tools, it is common for classification to span multiple product categories—ranging from Data Access Governance (DAG), Data Loss Prevention (DLP), User-Driven Classification (UDC), and Software as a Service (SaaS) point solutions.
Let’s look at use cases and tools to consider:
Use Case | Tools commonly used |
---|---|
Immediate control action only, no recording of the classification | DLP |
Insight into data within SaaS and on-premises environments | DAG, SaaS, File Analysis |
Insight into data within user endpoints | UDC, DLP |
Tagging based on user input | UDC, SaaS |
Automatic tagging | UDC, SaaS, DAG |
Essential classification capabilities
Because data classification is an enabler for other aspects of the data lifecycle—whether triggering other data security controls or imparting strategic insights—it must meet a broad range of capabilities.
Gartner suggests, “In order for classification to work, the following conditions must be met:
They also recommend the following criteria when evaluating classification tools:
Data classification technology strengths
Today’s classification tools, which encompass Data Access Governance (DAG), Data Loss Prevention (DLP), User-Driven Classification (UDC), and Software as a Service (SaaS) point solutions, generally perform well when it comes to the following capabilities:
- Coverage of file types
- Metadata reporting
- Workflow and privacy
- Specifically, Gartner emphasizes, “As privacy requirements expand globally, vendors are introducing privacy workflow and data subject access request (DSAR) support. These are common themes in most privacy regulations.”
Data classification technology weaknesses
These same tools also carry some inherent limitations:
- Data tagging limitations – Gartner states that “tagging has a significant limitation in that it is not possible to tag all data objects. Some file types have room or even formal support for tagging in headers or document properties. However, the vast majority of file types have no such capability or are so limited as to be effectively useless for tagging or other forms of labeling.” They also offer the guidance: “If the requirement is to track data across multiple dimensions (for example, internal policy, privacy, health and department), then use file analysis tools for visibility. Use classification tagging tools for policy-dependent control.”
- Classification change – They caution, “Classification tools support changing classifications. But automated tools are not good at automatic reclassification based on external conditions rather than changes in content.” They advise, “focus on data that might leave the company rather than moving internally, except where internal sharing would be absolutely unacceptable. Keep your most restrictive classification label reserved for exceptional rather than common use cases.”
- Encrypted data – Encryption often interferes with the detection of data. Gartner recommends as a best practice: “The safest classification approach is to treat individual encrypted files that cannot be accessed by the classification tool as sensitive and use controls to prevent their movement or access.” They further add, “the best approach for any unreadable file is to use the discovery of these files to support a review of the business process and sensitive data handling.”
User-driven classification and Zero Trust security
While user-driven classification has its place in data security, complete reliance on user inputs can introduce friction into the classification process while also increasing error rates. Alternatively, using a combination of user-driven classification and AI data classification allows for consistent data protection without the need for manual user input every time sensitive data is involved. These automated processes can then support the user and create guardrails within the labeling process.
By reducing the impact of user-driven classification in the data security process, solutions can be data-focused rather than employee-focused. This, in turn, makes a Zero Trust framework more feasible to implement.
In Zero Trust environments, users must be continually verified for access to all resources. While this is ideal for data security, the process can create roadblocks for users accessing sensitive information on a day-to-day basis. AI-enabled data classification allows for greater efficiency by limiting end-user participation within the classification process and instead uses contextual information to categorize and recognize files more efficiently and rapidly.
Data classification trends
There are several data classification technology trends emerging right now. Among them are automated tools that use machine learning and AI to build privacy workflows and classify.
Gartner provides the following recommendations for technical professionals responsible for data security:
- “Ensure that a data classification policy is in place as it is the root of data security governance. It provides clarity and authority, supports control standards and underpins user awareness efforts.
- Align your security classification with any broader data governance programs. It is easy to confuse users and create technical complexity and possibly conflict. Focus on high-risk and high-value data, especially regulated data, to support such alignment.
- Use automated data classification or AI-powered data classification to provide users with a baseline and you with insight. User classification is less expensive, but harder to introduce. Use both together for best results.
- Aim for “good enough” solutions, recognizing that the automated technology has limits in precision. Phase your implementation carefully to avoid diminishing returns.
- Use at least two labels. Sensitivity and either “owner” or “project/department.” Identify where the tag is to be used and provide only enough information to allow that control to work correctly.”
Redefining automated data classification
Automating data classification overcomes many limitations by making the process reliable, accurate, and persistent. A sophisticated platform, such as Spirion Sensitive Data Platform, can spot personal information by looking for data patterns, such as names, dates of birth, addresses, phone numbers, financial information, health information, and Social Security numbers. From there, it can classify these data files with context-rich labeling, ensuring that the right security measures get applied and your organization remains compliant with the corresponding privacy regulations. More importantly, automated systems can also re-classify data as needed, such as for updates and changes within the business or from changes in compliance regulations.
Every organization should assess the data classification automation options available in the marketplace and determine which solutions can provide them with the capabilities they need to take their data privacy protection to the next level. Ideally, organizations should choose a platform that is purposely built to deliver the key functions of deploying robust data privacy programs. The right automation system can aid in streamlining data classification, automatically analyzing and categorizing data based on predetermined parameters continually and in real-time.