Saturday, December 4, 2021

How (and Why) You Should Use Speech-to-text

Overview

Speech-to-text is one of the many cognitive services offered by Azure.  All Azure cognitive services can be classified into one of four primary categories:

  1. Decision
  2. Language
  3. Speech
  4. Vision

Each category contains a variety of services, with Speech-to-text categorized in the “Speech” category.  It converts spoken language to text using either an SDK or web API call.  Both methods will require a subscription key, obtained by a brief resource setup in the Azure portal.  Speech-to-text can be configured to recognize a variety of languages as displayed at https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support.

In addition, the source of the speech could be either live spoken words, or a recording.

 

How to Use Speech-to-text

Project Setup (using SDK)

  1. Setup the Speech-to-text resource in Azure.  Simply specify “speech services” in the search bar, and all speech resources in Azure marketplace will be displayed.  For this demo, we will use Microsoft’s Speech Azure Service.

 

 

  1. After clicking create and providing the fundamental parameters for the setup, subscription keys will be provided.

 

 

 

  1. Obtain the subscription key for the above resource

 

 

 

  1. Setup a project with the “Microsoft.CognitiveServices.Speech” NuGet package.

 

 

  1. Listen and convert

 

Why you should use Speech-to-text?

Accessibility! Most applications depend on users not being vision impaired.  However, this prevents a significant number of users from using an application due to their impaired vision.  Certainly screen readers have been available in the Windows OS for nearly 2 decades.  This allows any user to understand what is displayed on the screen.  However, an impaired user will have difficulty interacting with the user interface (i.e. submitting info, filling forms, etc.).  Thanks to Speech-to-text, users can now speak to the application and have the words dynamically translated to text in the application.  This makes the application accessible to more users, as well as ADA and WCAG compliant.

 

This post was featured as part of the C# Advent 2021, an annual blogging event hosted by Matthew D. Groves

Tuesday, November 30, 2021

AI Document Processing Workflows Q&A

Recently Vinod Kurpad delivered a presentation on using AI to build document processing workflows.  The presentation slides can be found at https://on24static.akamaized.net/event/34/60/86/0/rt/1/documents/resourceList1636579433122/useaitobuilddocumentprocessingworkflows1636579401517.pdf

 

By using AI, several boxed fields (i.e. SSN) on a forma can be treated as 1 field.  It allows dynamic entities, like tables, to recognize multiple rows in a single table with various column types.

 

Listed below are questions and answers discussed during the presentation:

 

  1. Could the Form Recognizer be used in Power Apps for receipt processing? Instead of using the AI Builder SKU that comes with Power Apps at an additional licensing cost

Form Recognizer is available in Power Apps and has a receipt processing app in Power Apps. Licensing model in this scenario is via Power Apps. https://powerapps.microsoft.com/en-us/blog/process-receipts-with-ai-builder/

  1. Is it possible to detect if a line of text on a document is a 'header' of the document section? For example, a form where customer data is entered might have a header above it that says "Customer Information".

Not yet, Layout currently supports text, checkboxes, selection marks and reading order. Paragraphs, headers and more are on the roadmap.

  1. Can the output from the model be XML?

Currently the only supported format is JSON. But there are a number of JSON to XML converters that should enable you to convert the output to XML

  1. Is there any plan to support "reading order" when there are multiple columns of text (or an inset section within a larger body of text)

Reading order already supports multiple columns. Try it out :)

  1. For the receipts, would you be able to extract the information in such a way, that you could later write a query to calculate total amount for a given date?

Form Recognizer receipts returns all line items, date, total etc. You can write validations and additional logics on top of the output as post processing. See here for all fields extracted from receipts - https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt

  1. Is there any plan to recognize paragraphs?

It's on the roadmap for a preview in the first half of next year.

  1. If you are interested in seeing Azure Form Recognizer applied to Financial Services, a great example is linked below:

https://customers.microsoft.com/en-us/story/financial-fabric-banking-capital-markets-azure

  1. What is the lowest DPI that this can read?

When using Form Recognizer no pre-processing of the images are needed. It supports low dpi and various image qualities.

  1. A key value pair can be a field text label and a field value?

Yes, a key value pair can also be a value with no explicitly labeled key, but a span of text that describes the value.

  1. can it recognize handwritten forms?

Yes, Form Recognizer supports handwritten in the following languages - English, Simplified Chinese, French, German, Italian, Portuguese, Spanish. see all supported languages printed and handwritten here - https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

  1. which languages does it current support

https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

  1. what are the supported file formats

Form Recognizer supports the following file formats - PDF (digital and scanned and multi-page), Tiff, bmp, jpg, png

  1. Does the solution skew, rotate and adjust poorly scanned forms?

Yes it does. No pre-processing needed. Please send the source document to Form Recognizer as is and Form Recognizer will do all the work for you.

  1. Are you considering combining this with voice recognition to enable the creation of printed transcripts from recorded interviews?

Cognitive Services also includes speech to text and transcription APIs that you can combine for interviews.

  1. My first time with Form recognizer but I am getting this error while configuring a resource - Operation returned an invalid status code 'Conflict' Form Recognizer Studio

Try creating a resource with a different name and see if that helps- the Studio is continually improving after the recent preview. Also send an email to formrecog_contact@microsoft.com

  1. Can Form Recognizer handle PDF's with FDF's included? For example, a fill-in form that includes signatures (which are not fill-in).

Form Recognizer includes signature detection capability, you can train a model to detect if a signature is detected or not.

  1. Do documents need to be OCR-ed? I am guessing it can't read pdf's with text in them, but embedded as images?

The Form Recognizer features (APIs) build on top of OCR and handle digital, images and hybrid documents transparently.

  1. For knowledge-mining of documents, in the example, it showed some results from a search with a synopsis of what I perceive are different documents. How does one then get to the specific document from the text provided (e.g., hyperlinked)?

The demo showed the specific document rendered in the browser, you can add a URL field to the index which is a pointer to the location of the original document

  1. Is it possible to label a table that spans across multiple pages in the custom model?

Not yet, on our roadmap but not available yet. Currently you should split the document into pages in such case and send the individual pages to Form Recognizer and then unite them in post processing. For invoices and line items tables that span multiple pages are supported.

  1. Is it possible to open a fott project in Form recognizer studio?

Yes, Form Recognizer enables opening an FOTT project and Form Recognizer supports backward compatibility and you can use 2.1 release models in the new release also.

  1. Can you talk about what I would do if I get different form types and don't know which one I'm receiving. (I'm thinking of data from a doctor's office, and they send all sorts of different forms)

If you know the different form types you expect to receive, you should be able to train a model for each form and compose all the individual models into a single logical model. Sending a document to this logical model will result in the document being classified to determine the right model to extract the fields. The selected model is also returned as part of the response

  1. Do all the documents need to be on Azure for Azure knowledge-mining to work, or can they be left on the existing Microsoft server and be auto-identified/updated?

No data can be anywhere and you can stream the document to Form Recognizer for analyzing or send a URL. Only your training data to train a custom model needs to be in a blob and only 5 documents are needed to get started with custom and train your own model

  1. Is there a max file size?

Max file size is 50MB currently.

  1. How would you train 2nd & 3rd pages of documents?

The Studio and the underlying files handle multi-page labels and navigating across pages to label the data you want to extract.

  1. Can you train the model with only 1 sample of a variation?

To train a model all documents within a project and a model needs to be from the same type, same format. You can then compose several models into a model composed so you do not need to classify the document before sending it to Form Recognizer. You need 5 forms from the same type to train a model.

  1. WoW, the form has handwritten text Will it be challenging for training the model since handwriting can be very different from individuals?

Form Recognizer text extraction supports print and handwritten and we just expanded handwritten from English to Simplified Chinese, French, Italian, Spanish, German, and Portuguese

  1. Can it handle files with hundreds of multipage documents?

Yes, Form Recognizer supports multipage documents.

  1. Is there a Resource provider that is necessary to be registered in Azure?

Form Recognizer is the resource you need to create in the portal

  1. What is a FOTT project?

The previous version of the labeling tool that's now superseded by the new Form Recognizer Studio -https://fott-2-1.azurewebsites.net/

  1. What if tables have sometimes 4 or 5 columns? And the headings of those tables are the same, but sometimes they appear as A,B,C, D and sometimes A, D, C, B

Layout extracts tables as they appear on the document and will try to extract all rows and columns including indicate merged rows and columns spans. When labeling tables if the table changes and has a variety numbers or rows or columns you can label it as a dynamic table.

  1. Can form recognizer handle handwritings ?

Yes, Form Recognizer supports handwritten text in the following languages - English, Simplified Chinese, French, German, Italian, Portuguese, Spanish. see all supported languages printed and handwritten here - https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/language-support

  1. How do you open an existing FOTT project in Form Recognizer Studio?

Connect to the same blob storage and the project will open with all the data and labeling completed.

  1. Is it safe to assume that this will be added to the power automate stack?

AI Builder which is a part of the Power Automate stack uses Form Recognizer and you should be able to use the same set of capabilities in AI Builder as well.

  1. If yes, how accurate can form recognizer translate the handwritings into text ? did you benchmark it against medical notes - type of handwritings

We suggest benchmarking it on your content but we believe and customers have validated that the handwritten OCR is world class for the supported languages, and yes Form Recognizer and OCR tech are heavily used in healthcare and financial verticals that involve processing handwritten text

  1. If there are variations within a document over some years, how does the Form Recognizer handle such variations?

When training a custom model if the variations are slight a key name changed, moved etc. than Form Recognizer should be able to extract, if you see that it misses than you can add a few documents with the variation to the model and improve the model. If the variation is big than you should train a model per year and then compose them to a model composed.

  1. Are there any additional examples documentation or videos available or sandbox?

You can start with the documentation https://docs.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/ . Also note that Form Recognizer supports a free tier which should allow you to test all the different features without paying for it

  1. Can I create custom code to integrate custom functionality in the workflow using C#?

Yes! Form Recognizer supports a few different languages with an SDK, but the REST API is available for languages not supported by an SDK

  1. How well does it generalize tables that don't have a strict row column format. For example, maybe some table cells are split cells that contain multiple fields?

Form Recognizer supports split cells, merged cells tables with no borders and complex tables. See more details on table extraction int he following blog https://techcommunity.microsoft.com/t5/azure-ai/enhanced-table-extraction-from-documents-with-form-recognizer/ba-p/2058011

  1. Can form recognizer identify tables when the tables are not delimited, but the columns are easily seen and organized?

If you mean concatenated or complex tables, then yes, we are continuously improving on extracting those. On the other hand, table extraction supports tables with or without physical lines.

  1. Is there any way to use this with power automate with clients that are on another 365 domain?

Form Recognizer is integrating within Power Apps see Form Processing apps. In reference to another 365 domain can you please reach out to Form Recognizer Contact Us and we can try to assist

 

 

Tuesday, November 2, 2021

Summary of What's new in .Net 6

On October 28, 2021, Jeff Fritz (Microsoft) presented “What’s new in .Net 6” to the Cleveland C#/VB.Net User Group

A recording of that presentation is now available here on my YouTube Channel.

Listed below is a brief summary of the key points from that presentation.

 

C#10

  • Program.cs and startup.cs are the starting point files for all .Net applications
  • Program.cs no longer needs namespace and ‘Program’ class declaration.  All of that is now implied
  • Read-only records can now be created in C#.
  • “global using whatsNewinCSharp10;” can be specified in 1 file and it will apply to entire application
  • <ImplicitUsings>enable in ,CSPROJ will bring in namespaces need for the project’
  • Generated global using statements will be inserted in *.g.cs
  • ‘namespace whatsNewinCSharp10;’ without {} can be used at the top of the file
  • String interpolation is now available in constants (public const string myDateTime = $”{System.DateTime.Now.ToString()}”; )

 

Blazor

  • 40-60% performance improvement in Blazor
  • Blazor apps to run as native apps
  • <PageTitle> and <HeadContent> can now be specified for each page
  • Ports in the URL are now randomly generated, no longer using Port 5000

 

 

.Net Release and LTS (Long Term Support) Schedule

 

Resources

 

 

Thursday, October 28, 2021

Nov '21 Technical Events

Virtual User Group Meetings

 

Virtual Conferences

 

Tuesday, October 19, 2021

Closing Announcements for “Global AI Back Together” Event

Thanks to all who attended the "Global AI Back Together" Event on October 19, 2021.

 

Keynote: https://www.youtube.com/watch?v=_pJ771eAPvA

 

Check-In (Swag & Azure Passes): https://checkin.globalai.community/  

 

Workshops

The Azure Function that can see: https://workshops.globalai.community/the-azure-function-that-can-see/introduction

Train and deploy a PyTorch model using the Azure Machine Learning platform: https://workshops.globalai.community/azure-machine-learning/introduction

 

YouTube: https://www.youtube.com/channel/UCU4ffaIzhsvMr_cCt9kjQMw

All presentations from the event will be posted. Please subscribe to the channel to get notifications when new videos are posted.

 

Evaluation: https://forms.office.com/r/ctYxEL4fDE

 

Saturday, September 25, 2021

Thursday, September 23, 2021

Oct '21 Technical Events

Virtual User Group Meetings

 

Virtual Conferences

 

Friday, September 17, 2021

Azure Custom Vision Object Detection

Computer Vision and Custom Vision are 2 subsets of services provided by Azure Cognitive Services.

 

Computer Vision: Analyze content in images.

  1. OCR: Optical Character Recognition
  2. Image Analysis: extracts visual features from images (objects, faces)
  3. Spatial Analysis: Analyzes the presence and movement of people on a video feed and produces events that other systems can respond to.

 

Custom Vision: Customize image recognition to fit your business needs.

  1. Image Classification: applies label(s) to an image
  2. Object Detection: returns coordinates in image where applied label(s) can be found.

 

When using the Object Detection Prediction API, the response returned from azure will be a JSON dataset using the following format.

 

    public class PredictionResponse

    {

        public string id { get; set; }

        public string project { get; set; }

        public string iteration { get; set; }

        public string created { get; set; }

        public Prediction[] predictions { get; set; }

    }

 

    public class Prediction

    {

        public string tagId { get; set; }

        public string tagName { get; set; }

        public string probability { get; set; }

        public BoundingBox boundingBox { get; set; }

    }

 

    public class BoundingBox

    {

        public string left { get; set; }

        public string top { get; set; }

        public string width { get; set; }

        public string height { get; set; }

    }

 

Each BoundingBox object in the response is represented graphically by the red boxes, as shown in the sample image below

 

 

 

In addition, listed below are some Gotcha’s to watch out for when working with Object Detection

 

  1. Be sure to use the same login for https://customvision.ai as the one used for the Azure portal.

 

  1. Use same "Directory" in CustomVision.ai and Azure portal.  This setting can be found in the top right corner for both Azure portal and CustomVision.ai

 

  1. When training the model, you must use a minimum of 15 images for every tag.  More images with different lighting, angles, and backgrounds will produce better results.

 

  1. The images types used for training must be .JPG, .PNG, or .BMP, and less than 4MB.

 

 

 

Thursday, September 16, 2021

.Net Release Schedule

There has been some confusion created with the release of .Net and .Net Core releases and how their support will be impacted in the near future.  Listed below is a summary from the Microsoft Support Policy:

.NET and .NET Core refer to several technologies including the runtime, ASP.NET Core, Entity Framework Core, and more.
.NET Core and future versions of .NET provide Long Term Support (LTS) releases that get three years of patches and free support.
Patches to releases are compatible, which eliminates risk adversely affecting applications.

For details, see the .NET and .NET Core support policy.

 

Version

Original Release Date

Latest Patch Version

Patch Release Date

Support Level

End of Support

.NET 5

November 10, 2020

5.0.10

September 14, 2021

Current

6 months after .NET 6 release (around May 2022)

.NET Core 3.1

December 3, 2019

3.1.19

September 14, 2021

LTS

December 3, 2022

 

 

For better visualization of the .Net roadmap, see the timeline below for release dates of past, present, and future releases.

 

Wednesday, September 1, 2021

A day of data and AI

O’Reilly Radar presents a day of data and AI that will showcase what’s new, what’s important, and what’s coming in the field. It’s free and open to the public.

The event is happening October 14 and again on October 21.  It's free, but registration is required.

RSVP here for October 14, 7:00am–10:30am PT / 10:00am–1:30pm ET

RSVP here for October 21, 8:30am–12:00pm IST / 2:00pm–5:30pm AET

 

Thursday, August 26, 2021

Sep '21 Technical Events

Virtual User Group Meetings

 

Virtual Conferences

 

Monday, August 2, 2021

AI Articles

SQLServerCentral.com is a well known web site for knowledge sharing on all things SQL Server related, as well as other DB technologies.  It’s a community driven web site that publishes a daily newsletter with articles from their contributors, as well as contributors on other web sites.  Since data is the basis for machine learning and AI, the newsletter also started to cover various AI related topics.  Listed below are a few articles from July 31st newsletter.

 

AI/Machine Learning/Cognitive Services

 

ML{.NET} Introduction

Machine Learning (ML) has come from a buzzword that is nice to have in your application to a must-have feature that works and adds value. Data scientists develop ML models in various ML Frameworks like TensorFlow, Scikit-learn, PyTorch, Azure ML, etc. Before ML.NET became available to all developers, adding the ML functionality to .NET applications required knowledge in some ML frameworks to build and train ML models.

 

AI in real-life: A Q&A with Gavin Day

From AllAnalytics

In this Q&A with MIT/SMR Connections, Gavin Day, Senior Vice President of Technology at SAS, shares real-life examples of artificial intelligence (AI) at work, discusses picking the right problems...

 

Detecting Financial Fraud with Machine Learning

From BlueGranite Blog

According to the Federal Trade Commission, consumers reported losing more than $3.3 billion to fraud in 2020, an increase of $1.5 billion since 2019. Contributing to this uptick in...

 

Rise of the Cyborgs: Using AI/ML to Enhance Human Intelligence (Part 1)

From Dataversity

Click to learn more about author Assaf Egozi. Modern organizations house a growing number of “citizen data analysts.” These individuals hold a wide range of positions in the enterprise, from...

 

An overview of Azure Cognitive Services

From SQLShack

Microsoft Azure has been a leading cloud service provider over the past few years. In this article, we are going to look into an overview of various cognitive services...

 

Sean Gallagher and an AI expert talk about our crazy machine-learning adventure

From Ars Technica

Join our headline experiment post-mortem today, July 28, at 1 pm Eastern time!

 

 

Saturday, July 31, 2021

Custom Lingual Adaptation (Preview)

“Custom Neural Voices” is a part of Microsoft’s Cognitive Services suite of technologies.  It allows users to develop highly realistic, humanlike voices by using Microsoft's groundbreaking neural text-to-speech technology. A user will simply record audio samples and then upload training data. Once the voice is created, it can be used in applications for Text-to-Speech or Voice Assistant.

 

In addition, once the voice is created, it can be used to speak fluently in 13 other languages:

  1. German (Germany)
  2. English (Australia)
  3. English (Canada)
  4. English (Canada)
  5. Spanish (Spain)
  6. Spanish (Mexico)
  7. French (Canada)
  8. French (France)
  9. Italian (Italy)
  10. Japanese (Japan)
  11. Korean (Korea)
  12. Portuguese (Brazil)
  13. Chinese (Mandarin, Simplified)

 

This will provide developers the ability to give their application a voice and persona that can be applied to 13 different languages/dialects.  Global companies such as  BBC, Swisscom,  AT&T and Duolingo have built realistic voices that resonate with their brands.

 

To learn more about custom voices, as well as getting started tutorials, visit https://speech.microsoft.com/customvoice.