John Liu .NET

View Original

Serverless connect-the-dots: MP3 to WAV via ffmpeg.exe in AzureFunctions, for PowerApps and Flow

There's no good title to this post, there's just too many pieces we are connecting.  

So, a problem was in my todo list for a while - I'll try to describe the problem quickly, and get into the solution.

  • PowerApps Microphone control records MP3 files
  • Cognitive Speech to Text wants to turn WAV files into JSON
  • Even with Flow, we can't convert the audio file formats. 
  • We need an Azure Function to gluethis one step
  • After a bit of research, it looks like FFMPEG, a popular free utility can be used to do the conversion

Azure Functions and FFMPEG

So my initial thought is that well, I'll just run this utility exe file through PowerShell.  But then I remembered that PowerShell don't handle binary input and output that well.  A quick search nets me several implementations in C# one of them catches my eye: 

Jordan Knight is one of our Australian DX Microsoftie - so of course I start with his code

It actually was really quick to get going, but because Jordan's code is triggered from blob storage - the Azure Functions binding for blob storage has waiting time that I want to shrink, so I rewrite the input and output bindings to turn the whole conversion function into an input/output HTTP request.

https://github.com/johnnliu/function-ffmpeg-mp3-to-wav/blob/master/run.csx

See this content in the original post
Trick: You can upload ffmpeg.exe and run them inside an Azure Function

https://github.com/johnnliu/function-ffmpeg-mp3-to-wav/blob/master/function.json

See this content in the original post

Azure Functions Custom Binding

Ling Toh (of Azure Functions) reached out and tells me I can try the new Azure Functions custom bindings for Cognitive Services directly.  But I wanted to try this with Flow.  I need to come back to custom bindings in the future.

https://twitter.com/ling_toh/status/919891283400724482

Set up Cognitive Services - Speech

In Azure Portal, create Cognitive Services for Speech

Need to copy one of the Keys for later

Flow

Take the binary Multipart Body send to this Flow and send that to the Azure Function

See this content in the original post

Take the binary returned from the Function and send that to Bing Speech API

https://docs.microsoft.com/en-us/azure/cognitive-services/speech/getstarted/getstartedrest?tabs=Powershell#build-recognition-request-and-send-it-to-the-speech-service

You'll need the key from the Cognitive Azure App earlier for Ocp-Apim-Subscription-Key

Flow returns the result from Speech to text which I force into a JSON

See this content in the original post

Try it:

Swagger

Need this for PowerApps to call Flow
I despise Swagger so much I don't even want to talk about it (the Swagger file takes 4 hours the most problematic part of the whole exercise)

See this content in the original post

Power Apps

Result

Summary

I expect a few outcomes from this blog post.

  1. ffmpeg.exe is very powerful and can convert multiple audio and video datatypes.  I'm pretty certain we will be using it a lot more for many purposes.
  2. Cognitive Speech API doesn't have a Flow action yet.  I'm sure we will see it soon.
  3. PowerApps or Flow may need a native way of converting audio file formats.  Until such an action is available, we will need to rely on ffmpeg within an Azure Function
  4. The problem of converting MP3 to WAV was raised by Paul Culmsee - the rest of the blog post is just to connect the dots and make sure it works.  I was also blocked on an error on the output of my original swagger file, which I fixed only after Paul sent me a working Swagger file he used for another service - thank you!