
Microsoft Build 2021 has come to a close. This year, like last year, was held virtually giving developers from everywhere an opportunity to participate in the event. There are lots of new goodies for the AI Edge developer and engineer. One of the more interesting is the Speech Service.
These no-code studios have become a common way of working with many of Microsoft’s products. Thankfully, we continue to get SDKs. The no-code studios are definitely a welcomed addition, but the SDKs seem to be the place where you can do the really cool stuff–like put things on devices.
And Speech Services has a Speech Devices SDK.
What is the Speech Devices SDK?
The Speech Devices SDK consumes the Speech SDK, which is the development kit that exposes many of the Speech service capabilities. There are currently 7 different versions available to cover the major languages. I won’t go into detail on what you can do with the SDK, but it generally covers scenarios like:
- Text-to-speech
- Voice assistants
- Keyword recognition
- Transcription
- Call center transcription
The Speech Devices SDK allows the developer to use these scenarios on devices that have a microphone array. The example device is the ROOBO Dev Kit. However, there are other options and I’m sure it won’t be long before someone has a DYI solution running on the more common Maker platforms like Raspberry Pi or Arduino.
What are the business applications?
Wow, there are many. I can see this being the voice and ears of an environment. Smart buildings that respond to commands. Intelligent conference room assistants that pickup on keywords and queue related data for display on view monitors. Transcription services. Drive thru food ordering.
That doesn’t even touch on the various integration options. About this time last year I worked on a personal project. I wanted to created a smart bot to help answer questions related to property rental. I used the Bot Framework Composer and the Q&A service to create a great solution. It worked as expected, but when I tried to tie it into a phone solution it was less than optimal. My hope is that this allows us to give speech and hearing to bots. Speech enabled apps bring a new level of accessibility to solutions.
How do I get started?
There are 7 Quickstarts to help get you rolling:
- Recognize speech from a microphone
- Recognize speech from a file
- Recognize speech from an Azure Blob
- Translate speech to text
- Synthesize text to an audio device
- Synthesize text to a file
- Recognize Intent
And of course Microsoft has made Learning modules available.
Closing thoughts
As always, I think it’s important to remember that these are powerful tools. This falls under the Cognitive Services family and needs to be applied with the responsible use of AI. The ability to add speech to apps isn’t new technology. but Speech services seems to abstract away some of the harder work of putting this into a production scenario. I can see many applications for this service.