Knowledge base
Knowledge base: English > Configuration Guide > DLP
How to tag files based on their content with Safetica
Posted by Daniel Koštialik, Last modified by Michael Skoupý on 03 April 2020 03:35 PM

Please note that this article is not up-to-date. With Safetica 8.3 (or later), tagging files based on their content is no longer required for DLP configuration. See our Data categories article for more information. Content discovery tasks are also recommended to be run from the new Data categories view rather than using this legacy guide.

To create a content tagging task follow these steps:

  1. navigate to DLP -> File tagging, and click “Manage data categories”.

  2. create your Data category (Content - example).

  3. create File content rule assigned to your new data category. Scroll down and unroll FILE CONTENT RULES and add a new rule.


Add name and select object (Next). Add Paths and extensions.

Tip: For tagging multiple types (SSN, NIN, HIPPA,…), it is best practice to create separate data tagging rules. You will have a much better overview of which rule tagged which files.

Content settings

You can choose from the following categories:

-          Social security numbers (SSN - USA), example: 123 - 45 - 6789

-          National identification numbers (CZE), example: 925327/9508

-          National insurance numbers (UK), example: AA 12 24 56 C

-          Credit card numbers, example: 4716-7750-2748-6285

-          Social security number (USA) + HIPPA. HIPPA includes: HIPAA.companies, HIPAA.diseases, HIPAA.diseases-icd10, HIPAA.drugs


Another possible way to tag sensitive data is using custom regular expressions. Regular expressions are a sequence of characters that define a search pattern, which is compared with individual words. Words are delimited by specific characters (space, -, , , /, {}, [], () )

Characters forming regular expression are:

-          letters [a-z],[A-Z]

-          numbers [0-9]

-          characters  £&_–@,

-          meta characters .*+[](){}


Each meta character has special meaning.

.           -           A dot matches any single character

*          -           An asterisk matches zero or more of the preceding characters

+          -           A plus sign matches one or more of the preceding characters

[ ]        -           Classes of characters. For example, [abc] means: any single character that is either a,b or c.

{}         -           Repetition. (a){3} matches “aaa”.



If you want to tag all documents containing the word “Invoice4343” you can easily create a regular expression like this ( Invoice4343 ). However, if you want to tag all invoices, not only with number 4343, you can use the expression:

This will tag all invoices  (Invoice0000 - Invoice9999).

We can also ignore upper and lower case.

Words “INVOICE4343” and “invoice4343” can be described in regex as: [Ii][Nn][Vv][Oo][Ii][Cc][Ee][0-9]{4}



Regular expression



Invoice0000 - Invoice9999


invoice0000 - INVOICE9999



Specific words



Regular expression


Payroll, payrolls, …



invoice, Invoices, …



PIN, pin, pIN




Regular expressions offer a possibility to search multiple words.




Regular expression

European Central Bank

European - Central - Bank


European  -     Central- Bank


^European$ ^Central$ ^Bank$

Daniel Brown


<name> Daniel Brown </name>

Daniel – Brown

^[Dd]aniel$ ^[Bb]rown$













(5 vote(s))
Not helpful