Files can be added to a project by selecting the upload button from the Documents page of the project or the Files page. The Opus 2 platform can accept files in batches. The maximum number of files in a batch is browser-dependent but is normally at least 5000. The maximum upload size is 2GB.


There may be options given to upload either documents or transcripts, depending upon the projects configuration. Either option will present a drag and drop window where source files can be placed for ingestion into the project.

Users may see other parameters that can vary depending upon their actual project configuration. The table below details these.


Table 1. Upload parameters

Batch Name

A text/numeric identifier for the current upload batch to help to locate processing issues in error tracking.

Publish

A toggle to allow the uploaded files to be published immediately in the documents view. .

Folder

If a document folder is selected before starting the upload process then the selected folder will be the default location for the documents. The destination can be changed by selecting the Edit.png icon and selecting a different destination folder. If no folder is specified then a folder will be created under the "Uploads" folder with the date and time of the upload as its title and the documents will be published there.

OCR

The dropdown list will present 3 options

  1. None (Default) - if the source files are known to already have OCR applied (fastest method)

  2. If Required - if the source files are not known to all have OCR applied. (Note: OCR is only performed on a page if no text is detected within the body of the page)

  3. Always - regardless of the OCR status of the source files the system will perform the application of OCR to every document (process intensive and slowest method)

Unpack zip archives

If the source file is a zip file then the system will extract the zipped content automatically and each file is ingested separately. If not checked then the file is loaded as a zip file and a slip note is provided.

Note: Password protected zip files cannot be automatically extracted and a slipsheet will be generated instead.

Further information

A free text tab where any notes can be made regarding the upload batch.


Uploading files without publishing

A file may be kept unpublished in order for it to be checked, or otherwise processed, before publishing for Project users to access. As such unpublished files are only accessible to administrators within the Project/files area, from where they can be published at the appropriate time.

Optical Character Recognition (OCR)

This process that adds an invisible text layer on top of the pdf document that is created for publishing within the Opus2 platform. This makes the text searchable and allows text-based annotation to be created.

In addition to individual files, Opus 2 platform accepts two types of compound files:

  • If zip files are uploaded then their contents are unpacked and each file is ingested separately.

  • Email files (.eml) are unpacked to extract any attachments. These are then associated with the body of the email as part of a document family.

For large volumes a local desktop uploader can be installed. The maximum number of files that can be uploaded at any specific time and the rate at which they are ingested depends on the hardware configurations. For cloud clients this forms part of the contractual arrangement with Opus 2 Ltd.


All uploaded files are scanned for viruses. If a virus is detected then the file is deleted from the system and the presence of a virus is reported to the user.


For display and annotation, Opus 2 platform uses PDF as its standard file type. The following file types are converted to PDF as part of the ingestion process: 

"eml", "msg", "docx", "doc", "dot", "wbk", "docm", "dotx", "dotm", "docb", "pptx", "ppt", "ppts", "ppa", "ppam", "pps", "ppsm", "ppsx", "pptm", "png", "jpeg", "jpg", "bmp", "tiff", "tif", "gif", "txt", "rtf", "odt", "ods", "odp", "odg", "html", "htm", "xlsm", "xls", "xltm", "xlsb", "xlsx", "xlt", "xla", "xlam", "mpp", "one", "vsd", "dxf"

M

edia Files 

Opus 2 Platform can play or stream audio and media files of certain types. The following files are supported: .mp3, .ogg, .wav, .mp4, .mpg, .mpeg, .webm


Table 2. Upload Related Performance Information

Feature

Component

Upper Limits

Notes

DOCUMENTS 

Documents in workspace

20 000

Recommended limit for optimal performance. Load times will begin to increase beyond this number.

Pages per document

15 000

Documents per folder

3000

TRANSCRIPTS 

Transcripts in workspace

20 000

  • .txt, .ptf, .mdb files for transcripts.

  • mp3 and .mp4 files for transcript media.

Original (native) files

The Original (or native version) of the file refers to the source file that was initially uploaded into the project. This native file is then processed and an attempt will be made to create a pdf version of the file for publishing within the project.

Downloading documents in pdf and original formats

Either version of the original document can be downloaded from the platform. The pdf version can include any annotations created against the document. The original format will be a copy of the file that was uploaded. Downloading a document version does not remove or delete that version from the project.

File processing status and processing result

The process of uploading files and ingesting them to create published documents can take some time to complete depending upon the the number and size of files being upload. This can also be affected by the OCR (Optical Character Recognition) set during the initiation of the Upload process.

The document upload and ingestion process utilises a 'round-robin' approach where the files that are queued are allocated resources on a 'fair' basis, rather than simply one after the other. For example; if one user is in the process of uploading a very large batch of files and another users uploads just one, this single file will be slotted into the processing queue in a way that avoids the latter user having to wait for a long time before their file/document is ready.

The status of the file during the ingestion process can be seen in the system metadata field called 'Status'. 

Note

If the status field can be located by scrolling through the document metadata fields. If it cannot be located then check that the column is set to visible by selecting the column to be visible in the table display via this icon  and ensuring the Status field is checked.


The status of the field will indicated by the following icons (hovering the mouse pointer over the icon will reveal the meaning):


Table 3. Upload Status

Icon

Meaning

Explanation

Available.png

Available

Processing completed and document is ready.

Slipsheet__password_or_unsupported_format_.png

Slipsheet (unsupported format)

Processing complete but the file type received was not recognised.

Slipsheet__password_or_unsupported_format_.png

Slipsheet (password protected)

Processing complete - the source file could not be opened.

Queued.png

Queued

File received and awaiting processing.

In_progress.png

In progress

File processing started.

Not_received.png

Not received

File was notified in Upload process but the specified file was not located.

Upload_error.png

Upload error

Upload file process interrupted

Processing_error.png

Processing error

Document not processed due to an internal error

Unreadable_file___corrupted.png

Unreadable file / corrupted

File could not be read

Processing_error.png

Virus

Document not processed because a virus was detected


Further information of the upload status can be found in the Files and the Upload history menu options if a user has the appropriate Project access levels.

Files

Uploaded files can be located by their batch name by selecting the Files option (found under the Project Button) and selecting the batch name folder on the left hand panel of the files view. (If no batch name was specified at the time of upload then it will default to the date and time that the upload process was started.)

There are 4 metadata fields found on the Files page that show processing related data

Tip

Use  ‘ + More ’ to filter for particular metadata fields and values. Additionally the column selector can be used remove unnecessary columns also.

Table 4. Files Processing Data

Field Name

Status

Explanation

Processing Status

Not Received

File not located

Ready

Processing completed

Queued

Awaiting start of processing

In progress

Processing started

Processing error

Processing was interrupted

Processing result

Available

a PDF was successfully generated from the source file

Slipsheet (unsupported format)

Slipsheet generated due to unrecognised file format

Slipsheet (password protected)

Slipsheet generated due to password preventing access to file content

Virus scan failed

Virus scanner failed to run and causes a processing error

Potential virus detected

Virus detection: this will cause a processing error

Unreadable file / Corrupted

Specified file recognised but unable to be open

Processing finished at

'Since last login' or date selection provisioned

Process of the file terminated at the indicated date/time

Published

True

PDF generated in a documents folder

False

File processing complete with PDF output not shown


If files fail to upload, or to be published, then there is a 'reprocess' option that will restart the process again. This is initiated by selecting the affected files and then selecting Actions/Reprocess files

Warning

Files that have failed due to 'Not received' cannot be reprocessed. The upload process will need to be restarted from the beginning for any files that affected.

File upload history

Previous upload batches can be viewed via the Upload history option under the Project Button.

A filter option will allow the history to show only results by:

Name

Batch name specified at time of upload

Description

Text inserted into 'Further information' tab at time of upload

Date Uploaded

Upload initiation date

Uploaded by

User that performed the upload

Files

Number of files uploaded in a batch

Processed

Number of files processed in a batch

Errors

Number of files with errors in a batch

Finished

True/False representing completion of the batch upload

Time Finished

Specific dates that batches were uploaded upon

An overview of the batch upload status can be seen in the right hand panel by selecting the row of a particular batch. Additionally the 'File details' tab will a further breakdown of the batch by showing the contents of the batch by file name and the destination folder specified at the time of upload.