On using Microsoft Flow as a pre-ETL step for Power BI

Photo by WeRoad on Unsplash

Photo by WeRoad on Unsplash

A topic I’ve presented a few times to an Power BI crew is the concept that Microsoft Flow makes a great pre-ETL step for Power BI.

I talked to many attendees at Difinity Power BI Conference Auckland about this. It’s probably a time to write this up to summarize my thoughts.

There’s just something nice about having a product called Flow gathering data into your lakes.

Power BI

Power BI - Power Query and DAX is extremely good at crunching numbers.

Flow has variables, branching and loop logic, but they are slow for number crunching. Loop limitations like 5k rows or total number of action steps are also very limiting.

Don’t use Flow to crunch numbers.

Flow

Power BI connects to about 70 sources - Flow connects to about 250 connections.

Flow’s HTTP action and Custom Connector framework is better at gathering data. Automatic paging, more authentication options, retry policy, async HTTP request.

Flow can gather data on Trigger, copy them all to a simple location (e.g. SharePoint library or Azure Blog Storage) for Power BI to start refresh. So data processing can be done in a secure way.

Flow can call Power BI API to start a refresh. Turning the ETL into a trigger-push system. We get the best of both worlds - Flow to gather data, and Power BI to crunch them.

Power BI Scheduled Refresh with Flow

  1. Schedule Refresh requires Flow to call Power BI API with a HTTP Request. But until very recently – this requires a delegate permission (so Flow has to call Power BI as a user) to perform delegate permission call requires a Custom Connector – which is what this blog post from Konstantinos goes into detail about creating.

    https://medium.com/@Konstantinos_Ioannou/refresh-powerbi-dataset-with-microsoft-flow-73836c727c33

  2. In February, Power BI has started releasing application permission to call Power BI API – this work is still in preview and requires quite a lot of set up, it also requires the administrator to approve the permission. App-only read/write everything is a pretty high level permission, non-admin can't grant this.

    https://powerbi.microsoft.com/en-us/blog/use-power-bi-api-with-service-principal-preview/

    pros: easier to call with Service Identity, and HTTP request
    cons: may be impossible to get your admin to approve a new AAD APP permission

  3. The April roadmap says Refresh is an action coming to Flow’s Power BI Connector – this will use delegate permissions and would be much simpler to use. So my thinking is that wait for this to drop after April

    https://docs.microsoft.com/en-us/business-applications-release-notes/April19/microsoft-flow/improved-power-bi-connector

Gaps between PowerBI streaming tiles and SharePoint

So I spend an evening playing (I actually have a lot of fun exploring these things) and figuring out how the pieces of SharePoint, PowerBI and Flow are supposed to work together.

In my head - they already connect.  But I have never seen anyone blog them.  So I decided to give it a stab.

Turns out there are some gaps.

The Idea

The Idea is simple.  We can create a PowerBI that uses SharePoint List as a datasource.  But instead of configuring scheduled refresh, we want to use the PowerBI Rest Dataset to push data in a streaming way.  And since Microsoft Flow has an action to do this, as well as the triggers to listen to SharePoint List.  We can get SP List-push-to PowerBI without needing schedule refresh.  That is a crazy fun idea.

The Reality

The reality is that there are several gaps.  These are probably solvable, but I just want to list them first, and we'll tackle them in the future.

Gap 1.  SharePoint List dataset != Push-enabled REST dataset

PowerBI makes a distinction between what's a REST/Pushable Dataset vs normal datasets like external lists.  In fact, Flow can not connect to a non-REST dataset.

So we need to create a REST dataset in PowerBI Service (it is not a feature of PowerBI Desktop), and then use the REST dataset as a live connection in a PowerBI Report.

Gap 2.  The only way to create a PowerBI REST dataset is via the REST API. 

There is no UI.  Ouch.  That pretty much makes this a developer task.  OK that's fine, we create a REST dataset via REST endpoint and a JSON schema (double ouch).  

Now we can build our PowerBI report, connect the REST dataset from PowerBI Service.  We save and publish this report to PowerBI Service and then insert the PowerBI Report in an SPFx webpart (PRO license needed for embed) into a SharePoint modern page.

This part is actually really seamless.  Don't worry, we have more gaps.

Gap 3. PowerBI report does not livestream REST dataset results.  

So I'm staring at my PowerBI visual in a SharePoint modern page.  In a separate window, I update the source SharePoint List.  In yet another separate window, I can see the Flow ran and push the new list item into the streaming dataset.

Excellent.  Except, the SPFx PowerBI Report Visual isn't updating.  It doesn't update.  I waited 15mins for it to do nothing!

If I F5, then I immediately see the new value.  But it doesn't do live streaming refresh :-(

It turns out, to see live stream results we need a PowerBI Dashboard or PowerBI Tile (streaming tile).

PowerBI Dashboard can only be created in the PowerBI Service.  We take an existing report and pin the visual.  This asks us to add the visual as a tile in a PowerBI Dashboard.

Gap 4. SPFx PowerBI report webpart does not show PowerBI dashboard embed.

So I create a PowerBI Dashboard and I go back to the SPFx PowerBI Preview webpart.  Only to find it doesn't do dashboard embed.

It only does Report embed.  So we will need to build our own SPFx that let us do dashboard embed.  This requires a embed token from MSGraph - but we should be able to piggyback the graph-util helper in SPFx to do our token exchange.

There's potentially one more issue.

Gap 5.  Does embed PowerBI Dashboard or Tile actually connect to streaming datasets?

I don't know the answer to this yet.

Gap 6. PowerBI REST Dataset endpoint can only add rows, not update them

The REST API lets us add rows to a REST dataset easily, or clear the table.  But there's no way to update an existing row.

The use case for the streaming REST dataset is like a ongoing stock ticker or temperature meter.  You don't update a record that's streamed past.  You only care about new records.

Flow only has an action to add row to PowerBI Dataset.

Gap 7. Flow does not have an action to Clear the dataset

The REST API lets us clear the dataset table, so technically, I could clear the table each time and repopulate it with the entire list again.

But unfortunately, Flow only has one PowerBI Action - add row to a REST dataset.  It does not have a clear rows action.

More work to do, more exploration to be had

Parts of the puzzle works really well.  Flow pushes data freely from SharePoint list changes into the streaming dataset.  If the dashboard or tile is shown on a webpage by itself, we immediately see it update like magic.

But if we want the dashboard/tile embedded within a SharePoint modern page.  There's still work to be done.

Ultimately, if we want live streaming data capability, it might be easier to use PowerBI LiveQuery against Azure SQL, and have Flow push data into that, instead of PowerBI REST streaming Dataset.