Skip to content
  • There are no suggestions because the search field is empty.

How to select file types for content analysis

Choose which file types Safetica analyzes for sensitive content to balance system performance with detection accuracy.

 

Introduction: Why limit which file types are analyzed 

In Safetica you have granular control of what specific file types (such as .docx and .xlsx) are analyzed for sensitive content. This way:

  • You unburden your environment, because all other files are not analyzed.
  • The results are more accurate, since they only come from the files types specified.

Safetica identifies files by their actual content, not just their extension. If a user renames a .docx file to .stl, Safetica still recognizes it as a Word document applies the matching rule.

 

 



Where to configure file types for content analysis

  1. Go to Data classification.

  2. Click Settings > Content analysis settings.

 

 



Content analysis file types

Use the Content analysis file types drop-down to choose which file types to analzye. You have 3 options:

  • All: Safetica launches content analysis for all files for which it is technically possible. Using this option may impact device performance.
  • Recommended: Safetica will analyze the content of selected, best-practice file types: txt, .xml, .html, .htm, .rtf, .zip, .csv, .pdf, .doc, .docx, .docm, .xls, .xlsx, .xlsm, .ppt, .pptx, .pptm, .pps, .ppsx, .ppsm, .msg, .eml, .one, .odt, .ods, .odp, .md, .epub, .rar, .7z, .gz
  • Custom: Specify your own file type categories or individual extensions that will be analyzed for sensitive content. All other types are skipped.

✍️ If OCR is enabled, it runs only on the file types selected here (e.g., if you enter .jpeg, OCR will only run on .jpeg files).

 

 


File type detection

Safetica identifies files by their actual content, not their extension. This means a matching rule still applies even when a user:

    • Renames the file (e.g., .docx → .stl)
    • Packs the file inside an archive (e.g., ZIP)

✍️ By default, Safetica evaluates file type rules against the detected file type for the Recommended set. Detection is reliable for these extensions and requires no configuration.

Content analysis always works based on a file's real type, not its extension. So even if a file is renamed, Safetica still analyzes it, as long as its real type falls under the chosen extension set and file type detection recognizes it. If detection cannot recognize the real type, the file is not analyzed.

 

 



More granular file type control

You can control file types even more granularly via data classifications. You can both limit and extend the custom set of file types.

Example: You have entered .pdf. and .jpg as custom file types for which to perform content analysis. If you create a data classification that searches files for “credit card numbers” and add a rule that specifies .xlsx files as the file type that should be searched, then all three file types (.pdf, .jpeg, and .xlsx) will be searched for sensitive content.

 

 

 



Where to see the results of content analysis

Results from content analysis appear in the Data operations > Data classification column. Hover over a classification label to see the rules that were matched, or click the classification label to display all classification details. 

 

 


FAQ

Q: Does file type detection work for files inside archives?

A: Yes. Safetica identifies file types based on content even when the file is packed inside an archive such as ZIP.

 

Q: Does OCR apply to all files or only the selected file types?

A: OCR runs only on the file types selected for content analysis. For example, if you enter .jpeg as a custom file type, OCR runs only on .jpeg files.

 

Q: Can I extend the set of analyzed file types for a single data classification without changing the global setting?

A: Yes. When you create a data classification, you can add a file type rule that targets extensions not included in the global Content analysis file types setting. Safetica then analyzes both the globally selected types and the types specified in the classification rule.

 

 

Read next:

Data classification in Safetica

Data classification: How to create a new data classification

Data classification: OCR

Policies: How they work in Safetica