Below is an architecture diagram of a POC I was working on.

  • Users upload image or video content to blob storage hosted on Azure
  • This triggers an Azure Function that reads the file details and creates and then inserts a stub JSON document on Cosmos DB
  • This creation of a new document in Cosmos triggers a second Azure Function that calls out the Computer Vision API
  • The returned JSON from the Computer vision contains all the output including face detect, face recognise and scene description
  • This JSON is then appended to the existing document in Cosmos
  • A third Azure Function is triggered that makes a second call to the Computer Vision API to scan the image for any text by using the OCR capabilities
  • The additional JSON is added to the Cosmos DB document if any text is found and recognised
  • The web user can view all the uploaded content and its analysis as shown below

Here you can see the type of insight returned by the API including Face Age, Gender and scene description as well as classifier tags and related confidences. There is also face recognition for some more famous people such as the head of the Met Police Cressida Dick and Microsoft’s own Scott Guthrie.

I have also implemented some basic search and filtering based on the extracted information e.g. by colour, has faces or is adult.

In the second part to this I will delve more deeply into the code and future enhancements.

If you have any questions please comment and watch out for the follow up!