How to select file types for content analysis

Decide which file types will Safetica analyze for sensitive content to optimize both system performance and the accuracy of results.

In Safetica you have granular control of what specific file types (such as .docx and .xmlx) are analyzed for sensitive content. This way, you unburden your environment, because all other files are not analyzed. At the same time, results are more accurate, since they only come from the files types specified.

In this article, you will find more about:

 

Where to configure file types for content analysis

  1. Open Safetica console.
  2. Go to Data Classification and click Content analysis settings.

 

For what file types can you perform content analysis

With the Content analysis file types drop-down, you can specify what types of files need to be analyzed for sensitive info.

You have 3 options:

  • All - Safetica launches content analysis for all files for which it is technically possible. Using this option may impact device performance.
  • Recommended – Safetica will analyze the content of selected, best-practice file types: txt, .xml, .html, .htm, .rtf, .zip, .csv, .pdf, .doc, .docx, .docm, .xls, .xlsx, .xlsm, .ppt, .pptx, .pptm, .pps, .ppsx, .ppsm, .msg, .eml, .one, .odt, .ods, .odp, .md, .epub
  • Custom - you can specify file type categories or individual file extensions that will be analyzed for sensitive content. Files of all other types will be skipped.

If OCR is enabled, it will also be applied only to the selected file types (e.g. if you enter .jpeg, OCR will only run on .jpeg files).

 

More granular file type control

You can control file types even more granularly via data classifications. You can both limit and extend the custom set of file types.

Example: You have entered .pdf. and .jpg as custom file types for which to perform content analysis. If you create a data classification that searches files for “credit card numbers” and add a rule that specifies .xlsx files as the file type that should be searched, then all three file types (.pdf, .jpeg, and .xlsx) will be searched for sensitive content.

 

 

Where to see the results of content analysis

Results from content analysis are visible in the Data section in the Data classification column. 

Hover over a classification label to see the rules that were matched, or click the classification label to display all classification details. 

Read next:

Data classification in Safetica

Data classification: What is Safetica unified classification

Data classification: How to create a new data classification

Data classification: OCR

Policies: How they work in Safetica