Sam Nasr

Microsoft’s Azure offers a great deal of features and capabilities. One of them is Cognitive Services, where users can access a variety of APIs to help mimic a human response. Some features include converting text to spoken speech, speech to text, and even the equivalent human understanding of a spoken phrase. These services are divided into 4 major categories, as seen below. Please note “Computer Vision” and “Custom Vision” sound very similar but their capabilities are different, as outlined in the “Vision” section below.

Decision

Anomaly Detector: Identify potential problems in time series data.
Content Moderator: Detect potentially offensive or unwanted content.
Personalizer: Create rich, personalized experiences for every user.

Language

LUIS (Language Understanding Intelligent Service)
QnA Maker: Create a conversational question and answer layer over your data.
Text Analytics: Detect sentiment, key phrases, and named entities.
Translator: Detect and translate more than 90 supported languages.

Speech

Speech to Text: Transcribe audible speech into readable, searchable text.
Text to Speech: Convert text to lifelike speech for more natural interfaces.
Speech Translation: Integrate real-time speech translation into your apps.
Speaker Recognition: Identify and verify the people speaking based on audio.

Vision

Computer Vision: Analyze content in images.

OCR: Optical Character Recognition
Image Analysis: extracts visual features from images (objects, faces, adult content
Spatial Analysis: Analyzes the presence and movement of people on a video feed and produces events that other systems can respond to.

Custom Vision: Customize image recognition to fit your business needs.

Image Classification: applies label(s) to an image
Object Detection: returns coordinates in image where applied label(s) can be found.

Note: Model can be exported for use: https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/export-your-model

Face: Detect and identify people and emotions in images.
Video Indexer: Analyze the visual and audio channels of a video, and index its content.
Form Recognizer: Extract text, key-value pairs and tables from documents.
Ink Recognizer: Recognize digital ink and handwriting, and pinpoint common shapes.

Saturday, December 4, 2021

How (and Why) You Should Use Speech-to-text

Overview

Speech-to-text is one of the many cognitive services offered by Azure. All Azure cognitive services can be classified into one of four primary categories:

Decision
Language
Speech
Vision

Each category contains a variety of services, with Speech-to-text categorized in the “Speech” category. It converts spoken language to text using either an SDK or web API call. Both methods will require a subscription key, obtained by a brief resource setup in the Azure portal. Speech-to-text can be configured to recognize a variety of languages as displayed at https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support.

In addition, the source of the speech could be either live spoken words, or a recording.

How to Use Speech-to-text

Project Setup (using SDK)

Setup the Speech-to-text resource in Azure. Simply specify “speech services” in the search bar, and all speech resources in Azure marketplace will be displayed. For this demo, we will use Microsoft’s Speech Azure Service.

After clicking create and providing the fundamental parameters for the setup, subscription keys will be provided.

Obtain the subscription key for the above resource

Setup a project with the “Microsoft.CognitiveServices.Speech” NuGet package.

Listen and convert

Why you should use Speech-to-text?

Accessibility! Most applications depend on users not being vision impaired. However, this prevents a significant number of users from using an application due to their impaired vision. Certainly screen readers have been available in the Windows OS for nearly 2 decades. This allows any user to understand what is displayed on the screen. However, an impaired user will have difficulty interacting with the user interface (i.e. submitting info, filling forms, etc.). Thanks to Speech-to-text, users can now speak to the application and have the words dynamically translated to text in the application. This makes the application accessible to more users, as well as ADA and WCAG compliant.

This post was featured as part of the C# Advent 2021, an annual blogging event hosted by Matthew D. Groves.

Tuesday, November 30, 2021

AI Document Processing Workflows Q&A

Recently Vinod Kurpad delivered a presentation on using AI to build document processing workflows. The presentation slides can be found at https://on24static.akamaized.net/event/34/60/86/0/rt/1/documents/resourceList1636579433122/useaitobuilddocumentprocessingworkflows1636579401517.pdf

By using AI, several boxed fields (i.e. SSN) on a forma can be treated as 1 field. It allows dynamic entities, like tables, to recognize multiple rows in a single table with various column types.

Listed below are questions and answers discussed during the presentation:

Could the Form Recognizer be used in Power Apps for receipt processing? Instead of using the AI Builder SKU that comes with Power Apps at an additional licensing cost

Form Recognizer is available in Power Apps and has a receipt processing app in Power Apps. Licensing model in this scenario is via Power Apps. https://powerapps.microsoft.com/en-us/blog/process-receipts-with-ai-builder/

Is it possible to detect if a line of text on a document is a 'header' of the document section? For example, a form where customer data is entered might have a header above it that says "Customer Information".

Not yet, Layout currently supports text, checkboxes, selection marks and reading order. Paragraphs, headers and more are on the roadmap.

Can the output from the model be XML?

Currently the only supported format is JSON. But there are a number of JSON to XML converters that should enable you to convert the output to XML

Is there any plan to support "reading order" when there are multiple columns of text (or an inset section within a larger body of text)

Reading order already supports multiple columns. Try it out :)

For the receipts, would you be able to extract the information in such a way, that you could later write a query to calculate total amount for a given date?

Form Recognizer receipts returns all line items, date, total etc. You can write validations and additional logics on top of the output as post processing. See here for all fields extracted from receipts - https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt

Is there any plan to recognize paragraphs?

It's on the roadmap for a preview in the first half of next year.

If you are interested in seeing Azure Form Recognizer applied to Financial Services, a great example is linked below:

https://customers.microsoft.com/en-us/story/financial-fabric-banking-capital-markets-azure

What is the lowest DPI that this can read?

When using Form Recognizer no pre-processing of the images are needed. It supports low dpi and various image qualities.

A key value pair can be a field text label and a field value?

Yes, a key value pair can also be a value with no explicitly labeled key, but a span of text that describes the value.

can it recognize handwritten forms?

Yes, Form Recognizer supports handwritten in the following languages - English, Simplified Chinese, French, German, Italian, Portuguese, Spanish. see all supported languages printed and handwritten here - https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

which languages does it current support

https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

what are the supported file formats

Form Recognizer supports the following file formats - PDF (digital and scanned and multi-page), Tiff, bmp, jpg, png

Does the solution skew, rotate and adjust poorly scanned forms?

Yes it does. No pre-processing needed. Please send the source document to Form Recognizer as is and Form Recognizer will do all the work for you.

Are you considering combining this with voice recognition to enable the creation of printed transcripts from recorded interviews?

Cognitive Services also includes speech to text and transcription APIs that you can combine for interviews.

My first time with Form recognizer but I am getting this error while configuring a resource - Operation returned an invalid status code 'Conflict' Form Recognizer Studio

Try creating a resource with a different name and see if that helps- the Studio is continually improving after the recent preview. Also send an email to formrecog_contact@microsoft.com

Can Form Recognizer handle PDF's with FDF's included? For example, a fill-in form that includes signatures (which are not fill-in).

Form Recognizer includes signature detection capability, you can train a model to detect if a signature is detected or not.

Do documents need to be OCR-ed? I am guessing it can't read pdf's with text in them, but embedded as images?

The Form Recognizer features (APIs) build on top of OCR and handle digital, images and hybrid documents transparently.

For knowledge-mining of documents, in the example, it showed some results from a search with a synopsis of what I perceive are different documents. How does one then get to the specific document from the text provided (e.g., hyperlinked)?

The demo showed the specific document rendered in the browser, you can add a URL field to the index which is a pointer to the location of the original document

Is it possible to label a table that spans across multiple pages in the custom model?

Not yet, on our roadmap but not available yet. Currently you should split the document into pages in such case and send the individual pages to Form Recognizer and then unite them in post processing. For invoices and line items tables that span multiple pages are supported.

Is it possible to open a fott project in Form recognizer studio?

Yes, Form Recognizer enables opening an FOTT project and Form Recognizer supports backward compatibility and you can use 2.1 release models in the new release also.

Can you talk about what I would do if I get different form types and don't know which one I'm receiving. (I'm thinking of data from a doctor's office, and they send all sorts of different forms)

If you know the different form types you expect to receive, you should be able to train a model for each form and compose all the individual models into a single logical model. Sending a document to this logical model will result in the document being classified to determine the right model to extract the fields. The selected model is also returned as part of the response

Do all the documents need to be on Azure for Azure knowledge-mining to work, or can they be left on the existing Microsoft server and be auto-identified/updated?

No data can be anywhere and you can stream the document to Form Recognizer for analyzing or send a URL. Only your training data to train a custom model needs to be in a blob and only 5 documents are needed to get started with custom and train your own model

Is there a max file size?

Max file size is 50MB currently.

How would you train 2^nd & 3rd pages of documents?

The Studio and the underlying files handle multi-page labels and navigating across pages to label the data you want to extract.

Can you train the model with only 1 sample of a variation?

To train a model all documents within a project and a model needs to be from the same type, same format. You can then compose several models into a model composed so you do not need to classify the document before sending it to Form Recognizer. You need 5 forms from the same type to train a model.

WoW, the form has handwritten text Will it be challenging for training the model since handwriting can be very different from individuals?

Form Recognizer text extraction supports print and handwritten and we just expanded handwritten from English to Simplified Chinese, French, Italian, Spanish, German, and Portuguese

Can it handle files with hundreds of multipage documents?

Yes, Form Recognizer supports multipage documents.

Is there a Resource provider that is necessary to be registered in Azure?

Form Recognizer is the resource you need to create in the portal

What is a FOTT project?

The previous version of the labeling tool that's now superseded by the new Form Recognizer Studio -https://fott-2-1.azurewebsites.net/

What if tables have sometimes 4 or 5 columns? And the headings of those tables are the same, but sometimes they appear as A,B,C, D and sometimes A, D, C, B

Layout extracts tables as they appear on the document and will try to extract all rows and columns including indicate merged rows and columns spans. When labeling tables if the table changes and has a variety numbers or rows or columns you can label it as a dynamic table.

Can form recognizer handle handwritings ?

Yes, Form Recognizer supports handwritten text in the following languages - English, Simplified Chinese, French, German, Italian, Portuguese, Spanish. see all supported languages printed and handwritten here - https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

How do you open an existing FOTT project in Form Recognizer Studio?

Connect to the same blob storage and the project will open with all the data and labeling completed.

Is it safe to assume that this will be added to the power automate stack?

AI Builder which is a part of the Power Automate stack uses Form Recognizer and you should be able to use the same set of capabilities in AI Builder as well.

If yes, how accurate can form recognizer translate the handwritings into text ? did you benchmark it against medical notes - type of handwritings

We suggest benchmarking it on your content but we believe and customers have validated that the handwritten OCR is world class for the supported languages, and yes Form Recognizer and OCR tech are heavily used in healthcare and financial verticals that involve processing handwritten text

If there are variations within a document over some years, how does the Form Recognizer handle such variations?

When training a custom model if the variations are slight a key name changed, moved etc. than Form Recognizer should be able to extract, if you see that it misses than you can add a few documents with the variation to the model and improve the model. If the variation is big than you should train a model per year and then compose them to a model composed.

Are there any additional examples documentation or videos available or sandbox?

You can start with the documentation https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/ . Also note that Form Recognizer supports a free tier which should allow you to test all the different features without paying for it

Can I create custom code to integrate custom functionality in the workflow using C#?

Yes! Form Recognizer supports a few different languages with an SDK, but the REST API is available for languages not supported by an SDK

How well does it generalize tables that don't have a strict row column format. For example, maybe some table cells are split cells that contain multiple fields?

Form Recognizer supports split cells, merged cells tables with no borders and complex tables. See more details on table extraction int he following blog https://techcommunity.microsoft.com/t5/azure-ai/enhanced-table-extraction-from-documents-with-form-recognizer/ba-p/2058011

Can form recognizer identify tables when the tables are not delimited, but the columns are easily seen and organized?

If you mean concatenated or complex tables, then yes, we are continuously improving on extracting those. On the other hand, table extraction supports tables with or without physical lines.

Is there any way to use this with power automate with clients that are on another 365 domain?

Form Recognizer is integrating within Power Apps see Form Processing apps. In reference to another 365 domain can you please reach out to Form Recognizer Contact Us and we can try to assist

Tuesday, November 2, 2021

Summary of What's new in .Net 6

On October 28, 2021, Jeff Fritz (Microsoft) presented “What’s new in .Net 6” to the Cleveland C#/VB.Net User Group.

A recording of that presentation is now available here on my YouTube Channel.

Listed below is a brief summary of the key points from that presentation.

C#10

Program.cs and startup.cs are the starting point files for all .Net applications
Program.cs no longer needs namespace and ‘Program’ class declaration. All of that is now implied
Read-only records can now be created in C#.
“global using whatsNewinCSharp10;” can be specified in 1 file and it will apply to entire application
<ImplicitUsings>enable in ,CSPROJ will bring in namespaces need for the project’
Generated global using statements will be inserted in *.g.cs
‘namespace whatsNewinCSharp10;’ without {} can be used at the top of the file
String interpolation is now available in constants (public const string myDateTime = $”{System.DateTime.Now.ToString()}”; )

Blazor

40-60% performance improvement in Blazor
Blazor apps to run as native apps
<PageTitle> and <HeadContent> can now be specified for each page
Ports in the URL are now randomly generated, no longer using Port 5000

.Net Release and LTS (Long Term Support) Schedule

Resources

Upgrade Assistant: https://dotnet.microsoft.com/platform/upgrade-assistant
Themes of .Net: https://themesof.net/
ASP.NET Community Standup - Blazor Native Interop with SkiaSharp: https://www.youtube.com/watch?v=lVWQkpcVEWQ
Brotli: https://www.brotli.org/ - Brotli is a compression algorithm for lossless compression of DLLs in Blazor, helping to increasing performance.

Thursday, October 28, 2021

Nov '21 Technical Events

Virtual User Group Meetings

Nov 17: Hudson CodeCraft (https://www.meetup.com/hudson-codecraft/)
Nov 18 : GLUG.Net: (https://www.meetup.com/GLUGnet/events/)
Nov 23 (Tuesday): Cleveland C#/VB.Net User Group (https://www.meetup.com/Cleveland-C-VB-Net-User-Group/)

Virtual Conferences

Oct 29-31: Azure Community Conference (https://azconf.dev/)
Nov 8-12: PASS Data Community Summit (https://passdatacommunitysummit.com/)
Nov 9-11: .Net Conf (https://www.dotnetconf.net/)

Tuesday, October 19, 2021

Closing Announcements for “Global AI Back Together” Event

Thanks to all who attended the "Global AI Back Together" Event on October 19, 2021.

Keynote: https://www.youtube.com/watch?v=_pJ771eAPvA

Check-In (Swag & Azure Passes): https://checkin.globalai.community/

Workshops

The Azure Function that can see: https://workshops.globalai.community/the-azure-function-that-can-see/introduction

Train and deploy a PyTorch model using the Azure Machine Learning platform: https://workshops.globalai.community/azure-machine-learning/introduction

YouTube: https://www.youtube.com/channel/UCU4ffaIzhsvMr_cCt9kjQMw

All presentations from the event will be posted. Please subscribe to the channel to get notifications when new videos are posted.

Evaluation: https://forms.office.com/r/ctYxEL4fDE

Saturday, September 25, 2021

AI Articles Weekly Series

This week’s series of AI articles is from Dinesh Asanka, an 8x Microsoft MVP in SQL Server. He has been working with SQL Server for more than 15 years, written articles and co-authored books. Dinesh is a presenter at various user groups and universities.

Introduction to Azure Machine Learning using Azure ML Studio

Data Cleansing in Azure Machine Learning

Prediction in Azure Machine Learning

Feature Selection in Azure Machine Learning

Data Reduction Technique: Principal Component Analysis in Azure Machine Learning

Prediction with Regression in Azure Machine Learning

Prediction with Classification in Azure Machine Learning

Comparing models in Azure Machine Learning

Cross Validation in Azure Machine Learning

Clustering in Azure Machine Learning

Tune Model Hyperparameters for Azure Machine Learning models

Time Series Anomaly Detection in Azure Machine Learning