Overview
Speech-to-text is one of the many cognitive services offered by Azure. All Azure cognitive services can be classified into one of four primary categories:
- Decision
- Language
- Speech
- Vision
Each category contains a variety of services, with Speech-to-text categorized in the “Speech” category. It converts spoken language to text using either an SDK or web API call. Both methods will require a subscription key, obtained by a brief resource setup in the Azure portal. Speech-to-text can be configured to recognize a variety of languages as displayed at https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support.
In addition, the source of the speech could be either live spoken words, or a recording.
How to Use Speech-to-text
Project Setup (using SDK)
- Setup the Speech-to-text resource in Azure. Simply specify “speech services” in the search bar, and all speech resources in Azure marketplace will be displayed. For this demo, we will use Microsoft’s Speech Azure Service.
- After clicking create and providing the fundamental parameters for the setup, subscription keys will be provided.
- Obtain the subscription key for the above resource
- Setup a project with the “Microsoft.CognitiveServices.Speech” NuGet package.
- Listen and convert
Why you should use Speech-to-text?
Accessibility! Most applications depend on users not being vision impaired. However, this prevents a significant number of users from using an application due to their impaired vision. Certainly screen readers have been available in the Windows OS for nearly 2 decades. This allows any user to understand what is displayed on the screen. However, an impaired user will have difficulty interacting with the user interface (i.e. submitting info, filling forms, etc.). Thanks to Speech-to-text, users can now speak to the application and have the words dynamically translated to text in the application. This makes the application accessible to more users, as well as ADA and WCAG compliant.