What is Data Capture Lite?

The Data Capture Lite addon for Alfresco is a module that enables the computer to “read” parts of your company documents. The module uses this text to populate the file index. This information takes care of the setup steps for other organization-related modules such as sorting, file naming, and search capabilities.

Data Capture Lite also allows the user to create intelligent document templates that show Alfresco where the important information on a document is located. This manner of Automated Forms Processing means that Data Capture Lite can index multiple documents automatically. When future documents with the same layout go into the Document Library, Alfresco uses this module to extract the index information of the file.

Why is Data Capture Lite so important?

  • Save time – massive amounts of man-hours are saved by intelligent data capture instead of manual data entry (typing)
  • Flexibility – intelligent character recognition allows Data Capture Lite to recognize fonts and handwritten information that enters the Alfresco system
  • Automated Forms Processing – companies can make custom templates to cover every imaginable type of incoming document from partners and vendors

The Benefits of Data Capture Lite

Skytizens has developed the Data Capture Lite Alfresco addon to eliminate the need for indexing files manually. This module removes a large amount of data entry work from a company’s document intake process. By empowering the Alfresco system to auto-index incoming documents, your company’s Document Library stays clean and organized. Secondary actions can also be programmed based on the index information extracted by Data Capture Lite—taking the guesswork out of organizational actions down the line.

How Does It Work?

Data Capture Lite can be used in two ways. First, the Alfresco addon can be used to populate the index information of a single document via intelligent character recognition (ICR). Alfresco users can select small areas of text in the document with a highlighter tool. ICR will capture those sections of text or handwriting and send them to the “OCR” or Optical Character Recognition module for reading and turning into letters and numbers.

This is simply highlighting each part of a document and having the OCR read it. Each piece of information is then pasted into the file index. This eliminates the need to manually type information and it cuts down on typing mistakes during data entry work.

Data Capture Lite developed by Skytizens currently reads in four languages with a total of five scripts:

  • English
  • Thai
  • Japanese
  • Chinese (Simplified) and Chinese (Traditional)

The second and more efficient way for the module to be used is to auto-extract information from full documents, which allows Alfresco to do data capture on multiple text areas for all future documents with the same layout. After the initial setup, the user does not need to tell the system where the index information is located on each page since the system can follow a template. This is known as Automated Forms Processing.

The initial setup of Data Capture Lite consists of designing a template for how to read each type of document. A specific folder is created that applies this template to all incoming documents and extracts the index information automatically. For the Lite version of the module, each document with a new layout requires its own template and its own landing folder in order for Alfresco to read it.

With the index fields of new files getting populated automatically, documents are prepared to enter secondary organizational processes soon after they enter the Document Library. Second steps might include actions that enhance search capabilities, sorting, file naming, and more. These secondary processes can also be automated. Thus, Data Capture Lite is an important first step for incoming documents in the Alfresco system.

Main Features 

Read a Single Document – Users can extract index information from a document using OCR technology.

  1. Preview document – From Preview Mode, the user can select Data Capture Lite from the action menu.
  2. Metadata – The properties fields are displayed next to the document Preview. Users must indicate which field to populate one-by-one.
  3. Highlight Area to OCR – The user can highlight the information on the document Preview and the OCR will read the characters and paste the information into the corresponding index field.
  4. Save Properties – If satisfied, the user can save the properties of the file that have been read by the OCR module.
  5. Clear Properties – If the information that the OCR produced is incorrect, the user can clear all the index fields of text with a single click. Now they can start over by moving or adjusting the size of the highlighted areas to extract more correct information.
  6. Save Document As – The user can save the file as a new file with the properties fields filled in. The old file will remain with empty property fields.
  7. Save Template As – The user can save the file as a template version with the SkyArea OCR information intact. This will keep track of the highlighted areas. This is useful in case the document will be used in the future as a template for reading multiple similar documents (see below).

Setup to Read Multiple Documents (by Template)–  Users can create a template by which to extract index information from future documents with the same layout using OCR technology.

    1. Preview document – Begin with a document that has a common layout.
    2. Metadata – The properties fields are displayed next the document Preview.
    3. Highlight Area to OCR – The user can highlight the information on the document Preview and the OCR will read the characters and paste the information into the corresponding index field.
    4. Save Template As – The user must save the file as a template so that this version with the SkyArea OCR information intact remains in the system. This document can now be used as a template for reading multiple similar documents.

Auto Index by SkyArea OCR (by Folder Rule) – Once a template has been created, it must be assigned to an intake folder using the Folder Rule. This means every document that enters this folder will automatically extract index information by allowing the OCR module to read the document content.

    1. Create Folder – With Data Capture Lite, each template requires its own folder. Every file that enters this folder will be processed from a single template.
    2. Manage Folder Rule – The user must create a rule for this folder which specifies the template to use. In the Alfresco Document Library, the user must navigate to the action menu of the folder and select Manage Rules.
    3. Create New Rule – The user can name the rule something related to the chosen template. Next, the user must designate the Action to Perform.
      • In the drop-down menu, the user must select SkyArea Data Extraction.
      • Assign template – Manually
      • Template – Here is where the user selects the document that has been saved as a template with the OCR locations highlighted and preserved.
      • Save – Save this rule so that all items created or entering this folder will be processed automatically. 

Custom Documents – The templates created for use with the Data Capture Lite module can be elaborated by creating document models with custom properties fields. This is done by using the Custom Model Manager (CMM) found in the Administration Tools area of Alfresco. In order to convert incoming documents to the custom model, a second rule must be added to the intake folder which instructs the folder to convert files to the custom document type. This ability is restricted by permissions controls related to the CMM module. 

Manage OCR Templates – Users who have been given permission can manage and edit multiple OCR templates.

    1. Toggle Status – Users can Enable or Disable a certain template in the Alfresco System.
    2. Edit – Users can edit existing templates.
    3. Create – Users can create new templates for use with Data Capture Lite.

Permissions Control – Access to the Data Capture Lite feature and access to the Data Capture Lite template management area are both managed by Group.

  1. Group Access – Permission to use the feature is given by the client’s administrator by designating members of a group.

Conclusions

No more manual data entry during the document intake process. Alfresco can read your documents, extract index information, and edit properties all on its own.

The Alfresco Data Capture Lite addon was developed by Skytizens to automatically index documents. Using Optical Character Recognition technology, Alfresco can extract common information such as company name, invoice date, information embedded in a QR code, and more.

Within moments of entering the system, Alfresco has all the correct information saved in the file properties where it belongs. Each document going into the Document Library is immediately searchable and ready for any next steps. This module eliminates the busy work of administrative data entry.

Alfresco Version

Alfresco Component Type

Development Status

Extension Point

Installation Method

Addon Name

Alfresco Product

,

Back to top