Document redaction is an operator controlled feature, so may not be available in all templates


Creating a redaction

A redaction is created on an individual document from within the document viewer. There is currently no option to redact text from multiple documents simultaneously. Redactions can be created in three different ways:

  1. Highlighting a specific piece of text or area: Similarly to when creating a note, text can be highlighted either as a line of text, as as a selected area of text or as a plain selected area within the document. Once a selection has been made, the option to 'Redact' is available as a choice in the highlight option dialog if redaction is enabled on the project. A redaction will be created on the selection, whether it is text or something other than text.
  2. Choosing a word from the word index: The left hand side panel of the document viewer contains a word index of all the occurrences of a specific word within the document. A chosen word can be bulk-redacted by clicking on the three-dots menu next to the word and choosing 'redact'. All occurrences of that word will be highlighted for redaction.
  3. Term search: In the 'Find' text search box at the top of the right hand side panel word index, users may input a specific term (a string of more than one word). If the term is found in the text, it can be temporarily saved to the index and be made available for redaction similarly to single word redaction from the index. The term is is only saved for the individual user session, and will not be available to other users or after the user has refreshed their page or logged out.

Once redactions have been created on the document, they can be adjusted in the right hand side panel of the document viewer. Depending on the type of document, and in particular variables like text and font, or whether the target for redaction is an image or diagram, it may be necessary to make adjustments to the redaction area. Adjustments can be made by moving the text area position in four directions (up, down, left, right) or by increasing or decreasing the coverage of the redacted area.

Generating a redacted document

Once redaction areas have been placed on the original document, two mechanisms are supported for generating new redacted documents; direct text manipulation and rasterisation followed by OCR. The two options have different performance characteristics so clients will be able to choose according to their specific circumstances.

The reason why two options are available is because direct text manipulation relies on a third party library that generates a new pdf from the document. This pdf is essentially the same the original but it has black rectangles where a redaction was created and the redacted text is simply not there at all. In principle, this is the preferred solution, but occasionally there are some issues with it, in particular:

  • if the redaction regions are a little bigger than the text and overlap the white-space around an adjoining word, that word can get removed too.

  • if the document contains text that is rendered as a vector graphic rather than via a font, that doesn't get removed. The black region is simply placed on top of the graphic. This is rare, but it can happen for text based logos (things that were originally SVG images).

If either of those problems are encountered, the rasterisation followed by OCR option provides an alternative route for redaction. This converts each redacted page into an image, similar to a scanned document. In the process the specified regions are blacked out, and then OCR is run on the image to recover an invisible text layer for search and annotation. It is reliable in terms of redaction, but if the original was a text file, the new file will be much bigger. The text also will not be as crisp since it is no longer rendered via vector fonts, and the same issues that may occur when running OCR on any image may occur. In particular, the text layer rendered via OCR can occasionally be unreliable, so some words may come out with wrong letters. This can make search less effective.

Redacted document files

When a redacted document is created, two separate files may be generated from the original document. One contains the redacted view document, the other a redlined document. The redline view allows a user to see where a redaction has been created, while still being able to see the underlying text. The redaction is demarcated with a red line under each word or phrase that has been redacted. Upon creation, the two files will be added to the folder in which the original document exists by default, although the user may choose a different folder to save the document(s) into. Any non-system metadata fields may also be edited for both the redacted and the redlined document. When created, a 'redacted versions' relationship is also created between the original document and the two derived documents. The original document is the parent document.