Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 5 Next »

It is possible extract the text contents of PDFs and images using Optical Character Recognition (OCR). The contents can then be included in searches.

This page describes how to enable search in asset contents. The setup consists of two steps:

  1. Content extraction with Microsoft Azure Cognitive Services.

  2. Including asset contents in searches.

1. Content extraction with Microsoft Azure Cognitive Services

The content extraction of PDFs and images relies on Microsoft Azure Cognitive Services. Thus, a Computer Vision resource in Azure must be used.

A new Computer Vision client can be created with the following steps:

  1. Login to the Azure portal (https://portal.azure.com/).

  2. Search for “Cognitive Services”

  3. Click “Add”, to add a new Cognitive Service.

  4. Search for “Computer Vision” and create a new client.

2. Including asset contents in searches

When the contents of an asset have been extracted with Microsoft Azure Cognitive Services, they are automatically written to the metafield “Asset content”.

The metafields “Asset content” and “Asset content concurrency token” are predefined and should not be manually modified if asset contents should be made searchable.

Including this metafield in the search “DigiZuite_System_Framework_Search“ as a freetext input parameter therefore includes the extracted contents of assets in freetext searches. The “Asset content” metafield can be added as a freetext input parameter by doing the following:

  1. Find “DigiZuite_System_Framework_Search“ in the ConfigManager for the product version of the product to enable this feature for. E.g. the product version of the MM5.

  2. Add a new input parameter.

    1. Locate and choose the metafield group “Content”.

    2. Choose the metafield “Asset content”, and choose the “FreeText” comparison type. Create the input parameter.

  3. Save the modified search and populate the search cache.

  • No labels