How to tag files based on their content with Safetica

Please note that this article is not up-to-date.

With Safetica 8.3 (or later), tagging files based on their content is no longer required for Protect configuration. See our Data categories article for more information. Content discovery tasks are also recommended to be run from the new Data categories view rather than using this legacy guide.

To create a content tagging task follow these steps:

  1. navigate to Protection -> File tagging, and click “Manage data categories”.

  2. create your Data category (Content - example).

  3. create File content rule assigned to your new data category. Scroll down and unroll FILE CONTENT RULES and add a new rule.

 

Add name and select object (Next). Add Paths and extensions.

Tip: For tagging multiple types (SSN, NIN, HIPPA,…), it is best practice to create separate data tagging rules. You will have a much better overview of which rule tagged which files.

Content settings

You can choose from the following categories:

-          Social security numbers (SSN - USA), example: 123 - 45 - 6789

-          National identification numbers (CZE), example: 925327/9508

-          National insurance numbers (UK), example: AA 12 24 56 C

-          Credit card numbers, example: 4716-7750-2748-6285

-          Social security number (USA) + HIPPA. HIPPA includes: HIPAA.companies, HIPAA.diseases, HIPAA.diseases-icd10, HIPAA.drugs

 

Another possible way to tag sensitive data is using custom regular expressions. Regular expressions are a sequence of characters that define a search pattern, which is compared with individual words. Words are delimited by specific characters (space, -, , , /, {}, [], () )

Characters forming regular expression are:

-          letters [a-z],[A-Z]

-          numbers [0-9]

-          characters  £&_–@,

-          meta characters .*+[](){}

 

Each meta character has special meaning.

.           -           A dot matches any single character

*          -           An asterisk matches zero or more of the preceding characters

+          -           A plus sign matches one or more of the preceding characters

[ ]        -           Classes of characters. For example, [abc] means: any single character that is either a,b or c.

{}         -           Repetition. (a){3} matches “aaa”.

  

Examples

If you want to tag all documents containing the word “Invoice4343” you can easily create a regular expression like this ( Invoice4343 ). However, if you want to tag all invoices, not only with number 4343, you can use the expression:

(Invoice[0-9]{4})
This will tag all invoices  (Invoice0000 - Invoice9999).

We can also ignore upper and lower case.

Words “INVOICE4343” and “invoice4343” can be described in regex as: [Ii][Nn][Vv][Oo][Ii][Cc][Ee][0-9]{4}

 

Word

Regular expression

Invoice4343

Invoice4343

Invoice0000 - Invoice9999

Invoice[0-9]{4}

invoice0000 - INVOICE9999

[Ii][Nn][Vv][Oo][Ii][Cc][Ee][0-9]{4}

 

Specific words

Word

Forms

Regular expression

Payroll

Payroll, payrolls, …

.*[Pp]ayroll.*

Invoice

invoice, Invoices, …

.*[Ii]nvoice.*

pin

PIN, pin, pIN

[Pp][Ii][Nn]

 

Phrases

Regular expressions offer a possibility to search multiple words.

Examples:

Phrase

Forms

Regular expression

European Central Bank

European - Central - Bank

European_Central_Bank

European  -     Central- Bank

...

^European$ ^Central$ ^Bank$

Daniel Brown

daniel.brown

<name> Daniel Brown </name>

Daniel – Brown

^[Dd]aniel$ ^[Bb]rown$