In order to search on images and scanned files, you can enable automatic OCR using the out of the box support for the Azure Computer Vision Containers on your Curiosity application.

Containers enable you to run the Computer Vision APIs in your own environment. Containers are great for specific security and data governance requirements.

You will need:

  • An active Azure subscription (you can test this feature with the free subscription)

  • Your Computer Vision endpoint

  • Your Computer Vision subscription key

  • Request approval to run the Computer Vision Containers

  • An environment capable of running Docker Containers (such as a Linux server, or Kubernetes or OpenShift deployments)

  • Access to the internet to enable access from the Container to the Billing APIs

For more information on the Microsoft Containers requirements, please check the official documentation.

Note: The Azure Computer Vision Containers is an offer from Microsoft - you can check the prices for using the API on their website.

Creating a Azure Computer Vision endpoint

If you already have a Computer Vision endpoint, you can skip to the next section.

Otherwise, you can easily create a new one from within your Azure portal page.

Click on Create a Resource:

Select the option AI + Machine Learning, and then click on Computer Vision

Configure your new Computer Vision resource as required:

And click on Review + Create to check all configurations, and finally on Create to deploy the new endpoint.

Once your endpoint is created, navigate to the section Keys and Endpoint and copy the value of the fields Endpoint and Key 1 - you'll need these values for the next step below.

Deploying the Docker Container

The basic command to run the Azure Cognitive Services Vision Container can be seen below:

docker run --rm -it -p 5000:5000 --memory 18g --cpus 8 mcr.microsoft.com/azure-cognitive-services/vision/read:3.1-preview Eula=accept Billing={ENDPOINT_URI} ApiKey={API_KEY}

You need to replace the {ENDPOINT_URI} and {API_KEY} values with the ones from the endpoint you created above. These values are used by the container to submit billing information to the cloud. No other data or documents are sent from the container to the Azure Cloud.

Once the container is online, you should see the following messages on your console:

Hosting environment: Production

Content root path: /app
Now listening on: http://[::]:5000
Application started. Press Ctrl+C to shut down.

... more similar messages here ...

Hosting environment: Production
Content root path: /app
Now listening on: http://[::]:5008
Application started. Press Ctrl+C to shut down.

To verify that your container is running, open your browser to the http://localhost:5000 and you should see the following page:

If you see this page, you're ready to proceed to the next step. Otherwise, check the troubleshooting section of the Microsoft documentation.

Configuring Azure Computer Vision on Curiosity

For the Container setup, you will need the address of the Read endpoint on your local container. To get the correct address, you can access the Swagger API interface hosted by the container under http://localhost:5000/swagger/index.html:

The value you need to get will depend on the version you're using of the container.

Curiosity has been tested with the container version v3.1-preview.2.

On your Curiosity application, navigate to Settings > Data > OCR Settings, and use the following values (adapted to the actual version of the container you're using as can be seen in the Swagger UI above):

  • Azure Ocr Endpoint: http://localhost:5000/vision/v3.1-preview.2/

  • Azure Subscription Key: 00000000000000000000000000000000

Click on Save to store all changes.

If you have already added files in your system, you might want to reprocess these files so that they will be OCR-ed as required. For that, you can run this simple query on the Shell interface :

Q().StartAt("_FileEntry").Tx().Set("Indexed", false).Commit();

Did this answer your question?