DLP: A Guide to Various Approaches, Their Strengths and Limitations
Data Loss Prevention is a critical compliance requirement for multiple regulations and standards
that require organizations to protect sensitive information. Some of these regulations include
HIPAA (Health Insurance Portability and Accountability Act) for protecting Protected Health
Information (PHI), GDPR (General Data Protection Regulation) for personal data of EU
residents, and ITAR (International Traffic in Arms Regulations) for defense and military-related
technologies.
Compliance Requirements
Organizations subject to these regulations must implement measures to identify,
classify, and tag sensitive data, as well as monitor activities and events surrounding that
data. DLP solutions can help achieve this by:
- Identifying and classifying sensitive data
- Monitoring data transfers and activities
- Reporting on data breaches and compliance incidents
- Providing audit trails and logs for compliance audits
Examples of DLP in Compliance
- HIPAA: DLP can help block unauthorized attempts to transfer patient records outside of a hospital’s secure network.
- GDPR: DLP can identify and classify personal data, add required security controls, and set up monitoring and reporting to protect EU residents’ data.
- ITAR: DLP can help ensure compliance by monitoring and controlling the exporting of defense and military-related technologies.
How DLP Works
Any DLP solution has two key components:
- Sensitive data discovery: the first step to protecting sensitive data is to know where it exists in the organizations and who has access to it. The data primarily can be in 3 places: Cloud, endpoints or SaaS apps. In the cloud, it can be object store (like S3 buckets), databases, key-value stores, file systems, data lakes or other storage media like images and videos. Here cloud can be a public cloud or a datacenter with similar storage systems or filers. On the endpoints, it can be copied by users from either cloud or SaaS applications. Finally SaaS applications are increasingly being used for all functions of a company like accounting, finance, CRM, marketing, HR and they store a lot of sensitive information. Users may export that data to local endpoints for their custom analysis.
- Sensitive data movement: An organization further needs to know when sensitive data moves from one location to another. This can be between various sanctioned locations, which is authorized. The risky part is movement to some unsanctioned location. If untracked, it is easy for organizations to lose data either from malicious code or insider threat. This data can leak over multiple channels: email, upload or copy to a remote location over the Internet, copy to external devices like USB
Types of DLP Solutions
In order to provide DLP with the coverage above, there are three broad types of Data
Loss Prevention (DLP) solutions, including:
- Network DLP: Focuses on protecting data in transit within an organization’s network, monitoring and controlling data flows. This type of DLP solution puts a secure perimeter around the data in motion on the network. Some of the common examples include SWG (a secure web gateway), CASB, which look at the data in transit for any
- Endpoint DLP: Protects data on individual devices, such as laptops and desktops, by monitoring and controlling data access and transfer. This type of DLP solution provides visibility into how individual users within an organization interact with data.
- Cloud DLP: Designed to discover and protect critical data in the cloud across various storage systems like object stores, Databases, key-value stores, third party API calls, data lakes etc. These solutions mainly focus on data discovery and cataloging but they are not in data path to track movement in and out of the cloud in most cases
Challenges with DLP solutions
The biggest challenge we see is that most DLP solutions operate in their own silo and an organization needs to stitch many of them together to get a true data protection solution. In addition, many of these solutions are complex to configure and come at a high cost. One has to configure rules to identify sensitive data. In many cases, rules are hard to define. For example, files containing IP of an organization do not have any specific pattern. Finally many of them are quite intrusive for the users. Here are some specific examples:
Network DLP Limitations:
Endpoint data loss prevention (DLP) limitations:
Cloud DLP solution limitations:
Network DLP Limitations:
- Technical limitations: Network DLP solutions can be complex to implement and may require significant technical expertise to configure and maintain.
- False positives: Network DLP solutions may generate false positive alerts due to limited context, which can lead to unnecessary investigations and potential misclassification of legitimate data usage.
- Can’t deal with encrypted traffic: Network DLP solutions can decrypt only a portion of the traffic and it is easy to bypass them by explicitly encrypting the traffic at the endpoint before transmitting.
- No ability to track data lineage: A user can copy sensitive data on the endpoint, do certain operations on it to mask the data and transfer over the network. This can completely bypass a network DLP solution if used by itself.
- Insider threat detection: Network DLP solutions can struggle to detect insider threats, as they may not be able to identify authorized users who are attempting to exfiltrate data.
- Performance impact: Network DLP solutions can impact network performance, particularly if they are not optimized for high-traffic networks.
- Cost: Network DLP solutions can be expensive to purchase and maintain, particularly for large organizations.
Endpoint data loss prevention (DLP) limitations:
- Complexity: Endpoint DLP solutions can be complex to implement and manage, requiring significant IT resources and expertise.
- Performance impact: Endpoint DLP solutions can impact system performance, particularly if they are decrypting and re-encrypting the traffic at the endpoint
- User acceptance: Endpoint DLP solutions may be perceived as intrusive or restrictive by end-users, potentially leading to resistance to adoption or use.
Cloud DLP solution limitations:
- Limited coverage: Cloud DLP solutions (some are also called as DSPM) cover a limited set of data sources and only focus on the data in the cloud. Once the data leaves these monitored sources, they are not able to track the data. So for endpoints, they need to be augmented with an endpoint DLP solution.
- Focus on data at rest only: These solutions do a good job at classifying data at rest. However, they are not part of data transfer paths and can’t monitor data in transit. That needs to be handled by other solutions which monitor data leaving in and out of a cloud environment.
- Cost: Many of these scan a lot of data to look for sensitive information. This can be quite expensive to do on a regular basis as more and more data is produced over time.
Conclusion
DLP is a crucial component of any compliance strategy, as it helps organizations protect
sensitive data and ensure regulatory compliance. By implementing DLP measures,
organizations can reduce the risk of data breaches, protect sensitive information, and
maintain compliance with relevant regulations. We believe that one needs to find a solution which covers both data at rest and in transit.
Also, look for a trade-off between how intrusive the solution is vs the protection it provides. In many cases, the adoption is limited because users are not comfortable with the privacy loss in many of the solutions. We believe that endpoint based solutions that can monitor data at the endpoint and also monitor traffic leaving the endpoint can provide a good sweet spot to balance these concerns. And no data is ever decrypted outside the endpoint itself.
Also, look for a trade-off between how intrusive the solution is vs the protection it provides. In many cases, the adoption is limited because users are not comfortable with the privacy loss in many of the solutions. We believe that endpoint based solutions that can monitor data at the endpoint and also monitor traffic leaving the endpoint can provide a good sweet spot to balance these concerns. And no data is ever decrypted outside the endpoint itself.