Data Categories

If you wish to protect specific data in your company you need to classify them first. This article outlines several approaches to do that.

Data category types

In versions 8.3 and above, Safetica offers four alternative ways of setting up its DLP features and these depend on the main use case:

  • A. you are able to specify what is considered as sensitive content in your company (e.g. personal information, credit card numbers, internal know-how expressions, etc.), and you want to protect files which contain it;
  • B. or you have already classified the company data into categories using a classification solution (e.g. labels for internal or sensitive data), and you want to protect these pre-classified files;
  • C. or your data cannot be categorized or specified by text content but can be defined by special contextual characteristics and expert identification rules;
  • D. or you want to protect files based on their file attributes (e.g. file extensions).

Option C is only recommended to knowledgeable and experienced users, as it requires considerably longer deployment time and troubleshooting and a higher level of maintenance in the long term.


A. Data categorized by sensitive content

This approach is particularly suitable for regulatory compliance use cases, e.g. to address GDPR, HIPAA, PCI-DSS and similar regulations; or to detect specific keywords or expressions which are considered sensitive in an organization. It allows you to specify built-in dictionaries, pre-defined algorithms, keywords and regular expressions which will be searched for among company files. If Safetica DLP detects sensitive content in a file, you can enforce security policies on it.

Data categories_1

Data categories defined by sensitive content also allow Safetica administrators to run discovery tasks, which can scan data on selected endpoints and provide reports of found files with sensitive content. In past versions, running content discovery tasks was possible through the legacy File Tagging feature (as documented here). For Safetica versions 8.3 and above we recommend using the new discovery tasks included under Data categories.


B. Data categorized by existing classification

This approach assumes you have used a data classification solution which will complement Safetica's DLP policies. It is suitable for environments where data classification is enforced through employees, company processes or automated classification solutions. For each of your classification groups or labels we recommend creating a separate Safetica data category, and specify the classification's proper format. You can follow these instructions to correctly configure your classification's metadata format.

As of version 8.3, DLP approaches A (sensitive content) and B (existing classification) have the following limitations:

  • supported DLP security policies: upload, e-mail, screenshots, clipboard, print, virtual print
  • supported operating systems: Windows 7 or higher
  • supported applications: Microsoft Word, Microsoft Excel, Microsoft Powerpoint, Microsoft Outlook, Adobe Reader DC, Foxit Reader, Notepad, Safetica-supported web browsers

The list of supported policies, applications and file types will be gradually updated in subsequent Safetica releases. If you find an important limitation, we encourage you to leave feedback so that it can be addressed in future releases.


C. Data categorized by context rules

This approach is suitable for special use cases with data which cannot be easily identified by content or existing data classification.

The expert context rules allow you to define data by:

  • the application where they originated,
  • the web source where they originated,
  • the path where they are stored.

The configuration of this approach is resource intensive, and the required effort to test, deploy, troubleshoot and maintain a context DLP increases significantly with the size of the environment and the complexity of security policies. Therefore we do not recommended using this as the primary approach to DLP, and rather have it cover only incomplete or atypical use cases.


D. Data categorized by file properties

This approach allows you to protect files based on their properties (such as file extensions). Use this option to protect specific file types (e.g. drawings) or in combination with content and metadata classification. DLP rules for data categorized by file properties could be applied to:

  • Individual file types (.cad, .pdf, etc.) or file type categories (Presentation, Image Files, Spreadsheet Files, etc.)
  • File types incompatible with Safetica sensitive data detection (Safetica currently supports sensitive data detection in these formats: : TXT, XML, HTML, RTF, DOC, DOCX, XLS, XLSX, PPT, PPTX, XLSM, ZIP, CSV, PDF)
  • File types incompatible with Safetica metadata technology (read more about file types that can be classified with metadata here)