Data classification: OCR

Enable and set up OCR to detect sensitive data in image files.

In this article, you will learn:


How OCR works in Safetica ONE 11

Safetica analyzes the content in selected file types. With OCR, you can expand the scope to images, such as photos and scanned documents.

Our OCR supports the following image types: .png, .tiff, .jpg, .jpeg, .jpe, .bmp.

We can, however, extract images also from other file types, such as .pdf documents, presentation files, ebooks, etc. You can find the list of supported formats here.

OCR is a global setting that applies to all devices.

OCR limitations

Safetica's OCR technology is primarily designed to extract sensitive content from scanned text documents. The scan quality is optimized to balance reasonable performance demands and the highest possible accuracy. Therefore, the technology may have difficulty processing certain types of text, such as:

  • text blended with background
  • low-quality images
  • scattered words without a clear paragraph structure
  • handwritten text
  • and more.

As mentioned above, OCR is not bulletproof. We recommend testing its accuracy on sample documents and using it as a complement to other security measures. If you feel OCR does not work according to your expectations, please contact Safetica Support.


How to enable/disable OCR globally

  1. In Safetica ONE 11 console, go to Data classification and click Content analysis settings.
  2. You can enable/disable OCR for all devices via the OCR checkbox.
  3. Save your settings.


How to activate/deactivate OCR for a specific device

  1. Open the Safetica Maintenance Console and go to Maintenance > Endpoint deactivation.
  2. Select the endpoint or group for whom you want to activate/deactivate OCR in the user tree.
  3. In the Protection features section, select the desired option with the Optical character recognition slider.

How to select an OCR language

You can choose 2 different languages to be scanned by OCR, which may be useful for customers with multilingual environments. However, setting a secondary language will impact device performance.

Setting a secondary language can improve detection accuracy in multilingual environments. It is useful mainly for different character sets, such as Cyrillic vs Latin vs Chinese alphabets.

Languages without special characters are a subset of languages with special characters. This means that if you set e.g. Czech or German as your primary language, then setting English as a secondary language is not necessary, since all the characters of English are already contained in Czech/German character sets.


Read next:

Data classification in Safetica ONE 11 

Data classification: What is Safetica unified classification

Data classification: How to create a new data classification

Data classification: How to select file types for content analysis

Policies: How they work in Safetica ONE 11