Data classification: How to create a new data classification

Thanks to Safetica unified classification, you can combine various types of classification elements and technologies

Data classifications in Safetica ONE 11 combine all our previously used classification technologies (i.e. sensitive content, context rules, and file property classifications). Now you don’t have to decide which technology to use, you simply create rules with a combination of various elements. Then you use the data classification in policies to detect and protect sensitive data.

In this article, you will learn:

 

What are rules and elements

Rules and elements are two layers that define the scope of a data classification.

Example: A GDPR data classification consists of several rules related to personal data from various EU countries. Each rule consists of several elements that specify that rule, usually, these combine regexes and keywords.

 

Data classification

Rule

Element

GDPR

Austrian passport numbers

[A-Za-z] ?\d{7}

österreichisch reisepass, reisepass

French driver’s license numbers

\d{12}

permis de conduire, drivers license

Czech birth numbers

\b[0-9]{2}(?:[0257][1-9]|[1368][0-2])(?:0[1-9]|[12][0-9]|3[01])/?[0-9]{3,4}\b

Rodné číslo, identifikační číslo, osobní identifikační číslo, czech republic id

German social security numbers

[0-9]{2} ?[0-3][0-9][0-1][0-9][0-9]{2} ?[A-Z][0-9]{3}

ausweis, identifizierungsnummer, personalausweis, sozialversicherungsausweis, sozialversicherungsnummer, versicherungsnummer

During policy evaluation, a classification is applied, if at least one of its rules is matched (OR relationship between rules). Also, all elements must be matched for the rule to be valid (AND relationship between elements).

 

Example: A document contains Czech birth numbers and the term “rodné číslo”. Our previously-defined GDPR data classification will apply to this document, because one of its rules (Czech birth numbers) is matched. The rule is valid, because both elements contained in it were matched (the regex Czech birth numbers and the keyword “rodné číslo”).

 

How to create a new data classification

  1. Open Safetica ONE 11 console.
  2. Go to the Data classification section and click Add classification.
  3. In the Rules section, click Add rule.
  4. Click Add element.

 

Elements are divided into 3 sections, and you can combine them as needed:

  1. Elements related to content analysis – you can set the detection trigger for the whole sensitive data section by clicking the icon. Click the icon to add more elements into the rule.
  2. Elements related to from where the file was transferred – you can specify through which places the file went. You can define e.g. that a file is sensitive if it comes from a CRM system OR accounting software AND is also stored in a particular location. You can also define that all files classified by a 3rd-party classification software are considered sensitive. The detection trigger does not apply to elements in this section.
  3. Elements related to file properties – you can specify the file types to which the rule should apply. The detection trigger does not apply to elements in this section.

 

What is the detection trigger, and how does it work?

The detection trigger defines the amount of sensitive data that must be found in a file for the data classification to apply. It only works for elements related to content analysis.

The detection trigger works on the rule level, so it applies to all combinations of keywords, regexes, predefined algorithms, and dictionaries that you choose for the rule.

 

 

Example: A company works with documents that contain a birth number on daily basis. It is unusual, however, for one document to contain several birth numbers. They set the detection trigger to 5 on the “Czech birth numbers” rule. This way, only files with 5 combinations of the birth number and the “rodné číslo” keyword will be classified with the GDPR data classification. Files with less than 5 matches will not be classified.

HubSpot Video

 

Duplicate occurrences are counted as one match only.

This means that if you have one keyword (e.g. the word "confidential") in a document multiple times, it is counted as one match.

Example 1: You create a rule with the pre-defined algorithm for credit card numbers and set the threshold to 5.

Files with 5 or more unique credit card numbers will hit the threshold and will be considered sensitive. Files with 5 or more identical credit card numbers will not be considered sensitive, because duplicate occurrences are counted as one match only.

Example 2: You create a rule with 10 keywords (e.g. “invoice”, “confidential”, “credit card”, ….) and set the threshold to 5.

To hit the threshold and be considered sensitive, a document must contain 5 or more different keywords. Files that contain 5 identical keywords will not be considered sensitive, because duplicate occurrences are counted as one match only.

Example 3: You create a rule combining the pre-defined algorithm for credit card numbers and a keyword (e.g. “confidential”) and set the threshold to 5.

Documents that contain the word “confidential” together with five different credit card numbers will hit the threshold and be considered sensitive.

 

What happens when a match is found?

Operations where a match is found are highlighted in the Analyze > Data section in the Classified and Sensitive data columns.

 

Read next:

Data classification in Safetica ONE 11 

Data classification: What is Safetica unified classification

Policies: How they work in Safetica ONE 11