New sensitive content detection in Safetica ONE

Enjoy a more powerful content scanning technology.

In Safetica ONE, we made a great technological leap and replaced our existing sensitive content detection with a more accurate and powerful one.

In this article, you will learn:

 

How does the new sensitive content detection work?

Safetica now runs with a completely new content core and text parsers that support more file types than the previous technology.

You can now gain a more granular control of what specific file types (such as .docx and .xmlx) are scanned for sensitive content in your company. This way, you unburden your environment, because all other files are not scanned. At the same time, results are more accurate, since they only come from the files types specified.

 

Where to configure the new sensitive content detection?

As before, you can configure content detection in Safetica Management Console in Protection > Data categories. Just select a sensitive content data category in the list on the left or create a new one, and click the Configure data category button.

In the next step, you will see new options available under the detection rule configuration: Optical character recognition and Extensions. The sliders are independent of each other.

 

For what file types can you launch sensitive content detection?

With the Extensions slider, you can specify what types of files need to be scanned for sensitive info defined by the detection rule above.

The slider offers the following options:

Extension setting

Description

Recommended

This option includes about 30 recommended “best practice” file types that we support. Recommended extensions are supported on all endpoints with installed Safetica Client.

 

txt, .xml, .html, .htm, .rtf, .zip, .csv, .pdf, .doc, .docx, .docm, .xls, .xlsx, .xlsm, .ppt, .pptx, .pptm, .pps, .ppsx, .ppsm, .msg, .eml, .one, .odt, .ods, .odp, .md, .epub,

With enabled OCR: .png, .tiff, .jpg, .jpeg, .jpe, .bmp

All

This option launches sensitive content detection for all files for which it is technically possible. That does not mean, however, that we are able to extract sensitive data from any file – the text parsers just support a more extensive set of file types than the Recommended option.

Using this option may impact endpoint performance.

 

.doc, .dot, .docx, .docm, .dotx, .dotm, .txt, .odt, .ott, .rtf
.pdf
.xhtml, .mhtml, .md, .xml
.chm, .epub, .fb2
.xls, .xlt, .xlsx, .xlsm, .xlsb, .xltx, .xltm, .ods, .ots, .csv, .xla, .xlam

Excel Open XML Macro-Enabled Add-In

Apple iWorks Numbers
.ppt, .pps, .pot, .pptx, .pptm, .potx, .potm, .ppsx, .ppsm, .odp, .otp
.pst, .ost, .eml, .emlx, .msg
.one
.zip, .rar, .tar, .gz, .bz2
ADO.NET

Custom

This option allows you to specify exactly in what file types sensitive content will be detected. You can specify either individual extensions or extension categories. Files of all other types will be skipped.

This option is best used when you want to optimize endpoint performance and only scan files which are necessary.

If you have several data categories set up with different extensions, Safetica will inspect all the file types.

 

What does the admin see in logs?

Results from our new sensitive content detection are visible in Safetica Management Console. Go to Protection > DLP logs > the Records table.

In the Sensitive content column, you will see:

  1. whether any sensitive data was found in a particular file
  2. whether the content detection finished completely or only a part of the file was scanned

For now, if any of the time limits mentioned below are exceeded, sensitive content detection is stopped. If no problem is found in the scanned part of the file, the action is allowed.

You will see the reason why a partial scan did not finish after clicking the Details link in the Details column:

Partial scan reasons

Description

Scan timed out

The maximum time reserved for searching sensitive data in a file was exceeded.

Scan for user action timed out

The time limit for sensitive content detection initialized by user actions is very short (a few seconds), so that users do not need to wait long for the results. If the scan does not finish in time, a partial result with this message is returned.

Text parsing timed out

The time limit was exceeded even before any text was parsed. This can happen, for example, when trying to parse unsupported formats or in extremely busy endpoints, when the system prioritizes other tasks, so the detection does not even start.