Serverless Parallelism in Microsoft Flow and SharePoint

This is a short post about running things in parallel.  There are two angles to this:

"Sometimes you fan-out to 50 parallels and want to run them as quickly as possible, sometimes, you want them in a single file and no one skips a queue".
 

Plan

  • Parallelism Settings in For Each (AKA: when to fan-out to 50)
  • Parallelism Settings in SharePoint Trigger (AKA: and when to queue up in single file)

For Each 

In 2017, we saw Microsoft demo'ed parallel settings in Logic Apps - the support for this dropped into Flow's UI recently.

What this allows is everything in the loop will happen at the same time, in batches of up to 50.  A great example is copying a lot of files from a SharePoint library to another.

This flow, by default, runs one element at a time.  Takes 45 seconds for 23 files.

Running For Each in parallel, the copying takes 6 seconds.

Advance use cases of Parallel For Each:

In SharePoint Site Provisioning - split large PnP Template into many small ones, and run them in parallel.  Since PnP Provisoining is additive - most of the actions can finish on their own.

HTTP action to AzureFunction has automatic retry policy by default.  So if an AzureFunction fails it will retry (default is 4 times with 20 second delay)

See also:

http://johnliu.net/blog/2016/11/build-your-pnp-site-provisioning-with-powershell-in-azure-functions-and-run-it-from-flow

http://www.vrdmn.com/2018/01/site-designs-flow-azure-functions-and.html

 

Sometimes, instead of running so many items at once in parallel (fan-out), we want to make sure only one item run at a time.  This brings us to the second part of this post.

Parallelism Setting in SharePoint Trigger

This one is trickier, and it's not always clear _why_ you need to do this.  But sometimes, you need to stop parallelism, and handle things one at a time.

Setting parallelism to 1

There's a problem Split On and Concurrency Control are exclusive.

So we need to turn off Split On.  This means the trigger will now return an array of SPLists (because the trigger works like a delta query)

I quickly enter about 10 list items in a SharePoint list - triggering off Flow runs.

The result of concurrency/parallelism controls here is that only one Flow run at one time.  It runs with a batch of items that we'll need to handle individually, but they do not overlap.

Advance use case of parallel setting on SharePoint Trigger:

If you are generating sequential incremental numbers on your SharePoint list - this is very useful to prevent two Flows run at the same time.

Summary

  • Parallel Settings on For Each 
  • Parallel Settings on SharePoint Trigger
  • And a small note about Split On and handling array of items on an Created/Updated event

 

 

Setting up MSGraph Webhook with HTTP Action in MicrosoftFlow

I've tweeted out several small tidbits of using Microsoft Flow's HTTP action to call the Microsoft Graph.

Hundreds of Graph APIs, dozens of Graph webhooks, one HTTP Action.

This little action continues to amaze me, so I'm putting several examples into this one blog post.

Four Techniques, One Action

  • Connect to Microsoft Graph with ONE HTTP Action 
  • Setting up Microsoft Graph Webhook Subscription
  • Paging
  • Retry Policy

One HTTP Action 

When we set Authentication to "Active Directory OAuth" - we can specify the Client ID / Client Secret in one HTTP action - so we don't need to make two separate calls first to Authenticate and get an access token, then call the flow we want and add the bearer header token.

This one action does it and asks no questions.

  • Tenant, Client ID and Client Secret are 3 strings. 
  • Authority should be set to https://login.microsoftonline.com/
  • Audience (resource) should be set to https://graph.microsoft.com

So yep, it's now easier to call any Microsoft Graph API from Flow than C#, PowerShell or JavaScript.

I'm in love with this, because this is way too amazing.

 

Note - this calls Flow via an App-Only Client ID.  If you are looking for delegate calls, you'll need to set up a Custom Connection swagger file.  Follow @skillriver https://gotoguy.blog/2017/12/17/access-microsoft-graph-api-using-custom-connector-in-powerapps-and-flows/

 

Microsoft Graph Webhook - a dozen new triggers

To set up webhook, we need to set up two Flows.  The First one is the subscriber.

The subscriber should be set up with a recurring 3 day schedule.

The notificationUrl is the HTTP Trigger URL of the second Flow.   

The expiry is 4229 minutes into the future from right now.  The maximum value is 4230 minutes.  If you go over the subscription call will fail.

On success, the subscription is set up and we are now listening to changes in our tenant's groups.

The Second one is the listener.

The listener needs to handle the validationToken that Microsoft Flow will call to test if your webservice follows the specs.

Read the trigger query string to pull out the validationToken.  if the value exists - then this is a set up call.  Respond immediately with text/plain 200 text.

Otherwise, we have a real call.  This is an event where our resource (in my case, I'm listening to Unified Groups being created and modified in my tenant) tells us something is happening. 

I call another Flow to start dealing with the change.  The other Flow does not return a response.  So the HTTP action can happen quickly as a trigger, and I can return 202 accept response to Microsoft Graph quickly.

 

This is the message sent from Microsoft Graph to Flow to tell me I've got a new Group created in my tenant.

Episode III in my blog series on Group Management with Flow will cover the webhook in more detail, combining it with a delta query and figuring out what changed.

 

Pagination

It turns out HTTP Pagination is baked in too.

To set this up - first set a $top in the Microsoft Graph call to artificially limit the number of rows returned.

This will also return a $nextLink

https://developer.microsoft.com/en-us/graph/docs/concepts/paging

Flip to the Settings for the HTTP action by clicking on ...
This lets you turn on Pagination and it will accept up to 5000 items.

The result is that the HTTP action will follow next page links automatically, and return you the entire array of the paged data concatenated together.

Magic.  Still one action.

Retry Policy

When calling HTTP (or other ApiConnections, like SharePoint) the default policy of retry 4 times exponentially means that if your action is going to fail, it will fail four times.

When building a Flow and you want it to fail fast - set the Retry Policy to None.

This is useful for calling AzureFunctions as well.  AzureFunction through host.json lets us control de-queuing speed and concurrency, but that requires you to use a Queue trigger.  When we use HTTP action from Flow - we control the retry / ease off policy through this setting.  In this use case, Flow is the orchestrater.

 

Summary

Calling Microsoft Graph directly with as simple as 1 HTTP request action means that a lot of slightly-more complex task of authenticating, then getting an access token, then calling MSGraph with bearer header becomes a whole lot easier.

And as steps get simpler, we can do a whole lot more.

 

 

Betting on 2018 - level up our Serverless in Azure

A recent conversation got me thinking about making some predictions for 2018.  This isn't so much a "ha look I'm right in 2019" post.  This is more about internalizing and verbalize my choices and I think there's value is sharing all this thinking.

So here it is, all of it: notes, wishlist, observations, what other people are doing, what we should be doing.  All in one overview blog post.  Happy 2018.

serverless-levelup.png

 

 

Bet on Serverless

You can't look sideways without seeing "Serverless" it's a silly term, but I need to start with a definition by Serverless experts on "Serverless"

  • Use a compute service to execute code on demand
  • Write single-purpose stateless functions
  • Design push-based event-driven pipelines
  • Create thicker front-ends
  • Embrace third-party services

I started on this path in 2016 and I can't look back.  Being able to run your code, anytime in the cloud is a life changing experience for many of us - it abstracts the operations part of hosting code in the cloud, and lets us get back quickly into code.

Applications for this technique are far and wide.  From simple services to augment the endless front-end applications we were building in 2016, to finally having a great way to handle remote events or permission escalation.  And look beyond to the bot-framework.  A little blog post I wrote in 2016 about Serverless site provisioning is now officially best practice in SharePoint's Site Design - I'm a little glad it was useful :-)  At times it feels like I just hack and cobble things together and behold, wow people do like this.

https://docs.microsoft.com/en-us/sharepoint/dev/declarative-customization/site-design-overview#pnp-provisioning-and-customization-using-microsoft-flow

So, what's next?

 

Serverless Orchestration

Invest into Serverless orchestration.  Azure Functions are not the right place to do our orchestration.  Yes, Durable Functions will help this a lot.  But the product we should be looking at is Azure Logic Apps / Microsoft Flow.

As far as I'm concerned - these are the same products, the differences boils down to:

Logic Apps

  • UX, with JSON editor is targetted for developers
  • Consumption based pricing - per actions used, perfect for multiple small requests 
    • So we end up compressing multiple actions into unreadable mess to save costs
  • Integration Services (biztalk scenarios)
  • Better for multi-tenant solutions.

Microsoft Flow

  • UX tries really hard to remain Power User friendly and hide JSON complexity
  • Per Flow execution pricing, with free buckets per tier
    • So we end up putting way too many steps inside a single Flow to save costs
  • Premium connectors as part of higher tier plans
  • Free licenses as part of Office 365 / Dynamic 365 plans making this cheaper for single-tenant solutions.

What can you do with Logic Apps/Flow?

  • Leverage connectors - (remember Embrace third-party services is a Serverless principle), these are hundreds of connectors implemented by the various product teams themselves directly.  So they know what they are doing* (most of the time)
  • You can do delay and wait easily in Flow
  • You can do loops easily in Flow (in Functions it's tricky without potentially hitting timeout).
  • You can do for-each loops in Flow and easily turn it into parallel execution (fan-out) with fan-in just part of the package
  • You can define repeat/retry policies with gradual fall back in Flow
  • You can define follow next token in Flow HTTP Request for REST paging
  • You can handle fallback behaviour as a scoped set, so if any actions fail you can orchestrate that
  • You can include human workflows with human approvals and send nice templated emails with attachments from Flow
  • Function shouldn't do more than one thing.  Use Flow to chain them.

 

Serverless API end points

As we build out a constellation (I stole this word from https://www.slideshare.net/HeitorLessa1/serverless-best-practices-plus-design-principles-20m-version) of functions.  We need to clean up all the microservices APIs with a unified API front.  There are two products for this:

Azure Functions Proxy

  • Simpler - can transform query/post messages

Azure API Management Service

  • More extensive - can transform REST to XML
  • Better Open API definitions

 

Serverless Websites / thicker Front-Ends

A serverless website is basically a CDN plus FaaS.  You don't scale Azure VM or even Azure WebJobs.  Build your entire website with your favourite JavaScript library (I like and recommend Angular - but you should use what your team uses), then bundle with Webpack into a couple of minified JS file for CDN.

Do your compute in the client.  And do your server compute with Azure Functions.

I'll even add here that a low-code solution such as PowerApps is extremely good at getting a proof of concept up and running quickly.  PowerApps supports offline capabilities and will happily call your Serverless APIs via a Swagger/OpenAPI file and treats them all as first class functions.

Wish

As part of the Azure services upgrade email (what a peculiar way to announce new features), the upgrade to the latest Windows Server means that Azure Functions, as part of Azure App Services, will gain ability to work with HTTP/2.

It means - we can get our entire HTTP website in one HTTP Get request, with our Function (or possibly our LogicApp/Flow) sending multiple resources in one response.

 

Serverless Database

Let me first define what is a Serverless Database.  Essentially, you have a database in the cloud. 

  • You want to pay for storage.
  • You want to pay for compute.  On consumption based plans
  • Pay nothing if it's not doing anything, automatically scale as necessary
  • The problem we are trying to fix is simple.  We want to start an application, pick a database, and have it scale with us.  We don't want to put SQL Azure on the cheapest free VM and have it run like crap.

This is an area where Azure is somewhat lacking.  My choices are:

  • Azure Storage Table
  • SharePoint Lists (only because I'm a Office 365 person and I've got office 365 tenants everywhere I look)

Wish

I predict boldly that Azure will bring out a Serverless CosmosDB solution in 2018 and it will be what everyone in the Microsoft ecosystem uses from there onwards.

Otherwise, look towards the competition:

  • Google Cloud Platform has Firebase - event driven, consumption based database, linked to Google Cloud Functions
  • Amazone Web Services has Aurora Serverless - in late 2017, AWS announced they've separated Aurora's cost model down to Compute and Storage.

 

Serverless Event Aggregator

The Azure Event Grid is a very interesting service.  I see the possibility that we'll see a unified way to manage all events in a system.

This is best explained with a parallel analogy.  In browser applications, we catch and handle events in the DOM all the time.

In the beginning, we do:

$(element).click(func)

This has all sorts of problems - how do you route.  How do you de-allocate.  How do you attach new events as new resources come online.  A few years later, we end up with this:

$(global).on("click", ".filter", func)

We attach events via one top level resource, ALL our event handlers are attached there.  And then we let events bubble to the root, apply the filter, then call the handler.

The Azure Event Grid has the potential to be this solution.  In 2019, if we are attaching event handling directly to a resource or a container, then we have stuffed up.  We should attach all our events to Event Grid, then filter within the event grid, and only then invoke the functions that fits the filter.

Wish

I'm hopeful if we can map Microsoft Graph events into the Azure Event Grid - then we'd have something super magical.

 

Serverless Visualization

I want to end on this one because I don't have a great solution, but I think we need a great solution.

Wish

As we built out our constellation of functions and orchestration, there's a need to visualize that design so we can both review the designs, and specifically see where the bottle necks are.

If a set of microservices are buggy, this would be a place to pintpoint this and switch the Functions back to the previous deployment slots.

With Azure Insights - we can get detailed logging for Functions and Flow/LogicApps, so perhaps this is something that needs to be layered on top of the logging.

 

References