1. What is OCR?
OCR stands for Optical Character Recognition and determines whether or not a document is text searchable.
All documents containing text should ideally be OCR'd in order for them to be text searchable on Opus 2. This enables a user to accurately search all content uploaded for specific words / phrases (Boolean operators apply). OCR will be affected If your document is poor quality or contains hand written text. Users are able to OCR documents as part of the upload process to Opus 2; select "If Needed" or "Force" at the time of upload.
It's also possible to OCR documents in Adobe prior to upload. This will speed up the upload process; simply ensure the OCR is set to Ignore.
2. How can I tell in Opus 2 is the document is OCR'd?
In the Documents tab, on the right side of the screen, click on the stacked icon, and check the box for Show Selectable text. All text should be highlighted if the document is fully OCR'd.
If you're already inside the document and would like to see if it is indeed OCR'd, drag your cursor and highlight the text, taking care of highlight the words and not blank space. It should highlight on a per-word level, as shown below:
Below is the same document when NOT OCR'd; the highlighting appears over blocks of space rather than actual text:
3. I am running OCR on a set of documents and it is taking a long time in the Upload Manager, or keeps timing out. What could be causing this?
A prolonged or repeatedly failing OCR can be attributed to one or more of the following:
- The document is very large and has a high number of pages.
- The document is a "bad scan" wherein text and/or images are not appearing clearly and distinctly, i.e., a grainy photocopied page.
- The document contains signature(s) and/or handwriting.
- The document contains a very large amount of foreign characters, some of which may not be recognized by Opus.
If any of the above should be the case, it is recommended that the document be OCR'd in Adobe and then uploaded to Opus with OCR set to "Ignore."