In order to search on images and scanned files, you can enable automatic OCR using the out of the box support for the AWS Textract APIs on your Curiosity application.
You will need:
- An active Amazon Web Services account
- The AWS region you would like to consume the API from.
- A valid AWS Key ID and Secret Access Key with access to the AWS Textract API
Setup your AWS Textract account
Please follow the official guide from AWS to setup your account for AWS Textract, create the required IAM user with access to the API, and generate the Secret Access Key. The IAM user must have the role
AmazonTextractFullAccess added to it.
To retrieve the Access Key ID and Secret Access Key values, navigate to the AWS IAM Users page, select the user you created, and go to the tab Security credentials. Create an access key & secret pair. Remember to store the values in a safe place, as you will not be able to retrieve the secret key again.
Configuring AWS Textract on Curiosity
On your Curiosity application, navigate to Settings > Data > OCR Settings, and enter the values for AWS Region, AWS Access Key ID and AWS Secret Access Key. Click on Save to store all changes. The value you enter for AWS Region must be a valid region identifier such as us-west-1 or eu-central-1.
If you have already added files in your system, you might want to reprocess these files so that they will be OCR-ed as required. For that, you can run this simple query on the Shell interface :