Data classification: How to create a new data category

Learn how to optimize the detection of sensitive content and get better results during the content scanning process.

Safetica NXT searches the text of documents for sensitive content defined in data categories. Events where a match is found are highlighted in the Data security > Overview > Event overview table with a corresponding data category label.

In this article, you will learn:


What are definitions and conditions

Definitions and conditions are two layers that define the scope of a data category. They help to increase accuracy and optimize the content scanning process.

     1. Definitions - by adding definitions, you broaden the scope of a data category. A definition may contain one or more conditions.

During content scanning, at least 1 definition must be matched for the data category to be applied to that event (OR relationship between definitions).

     2. Conditions - by adding conditions (such as built-in algorithms, keywords, or regular expressions), you refine each definition.

During content scanning, all conditions must be matched for the definition to be applied to that event (AND relationship between conditions).

 

How to create a data category

1. Go to Data security > Sensitive data.

2. Click Add category.

3. Enter the name and description of the data category.

4. Click Add definition, enter definition name, and specify the threshold.

Threshold determines how many times a definition must be found in a file.

5. Add individual conditions.

There are 3 types of conditions you can add to recognize specific types of sensitive data and optimize the scanning process - built-in algorithms, custom-defined keywords, or regular expressions.

6. You can also edit or delete already existing definitions.

Can I import a dictionary with a list of keywords?

Importing a dictionary is not possible for now.

However, you can create large custom dictionaries by copy-pasting keywords from the clipboard and processing multiple terms in one go. Individual keywords must be separated by space.

Limitations on duplicates and minimum keyword length (3 characters) is still applied.

 

Want to learn more? Read next:

Data security - Sensitive data

How to investigate files with sensitive content

How to edit or delete a data category

What are regular expressions?

Templates for region-specific sensitive data detection