Implementing a fast sort with Microsoft Flow using Parallel Compute
/This is #FlowNinja hack 112. Parallel Compute Sort.
I had written about how to sort with a variable (this is insertion sort) back in 2018 How to implement sort with Microsoft Flow-in-3-actions-within-a-loop
But in this previous method, the use of variable means we can’t run apply to each in parallel, so this method was always slow when array is large. Today, while chatting with Hiro - I had a sudden idea to revisit the pattern and see if I can make this quicker.
The problem
To create a sort, using parallel apply to each, and get a sorted array at the end.
In 2018, I didn’t have many of the patterns I need to make this new 2020 sort method. Firstly, to get results from parallel apply to each, we need Pieter’s Method (Compose apply to each inner output) to fan-in after parallel fan-in.
Second, we need to sort the actual array, and I came up with a pretty interesting method.
How this works
Consider array [ “d”, “e”, “c”, “b”, “a” ]
If we say for each character, filter array for items that are < than the current item, we’d get:
3:d, 4:e, 2:c, 1:b, 0:a
Then, if we consider, hey, we have 5 items
[0, 1, 2, 3, 4] => map to this dictionary, we’d get [ “a”, “b”, “c”, “d”, “e” ]
Side Story
I actually was thinking about this pattern while driving home, once it clicked I had to pull over the side, park the car, take out my laptop and write this Flow, and after I saw it work I drove home.
Steps
Some additional considerations
If the original array has duplicates
[ “a”, “b”, “b”, “c” ]
0:a, 1:b, 3:c
We’ll see 2 is missing. This is not end of the world, but when we do the final map of
[0,1,2,3] => [ “a”, “b”, null, “c” ]
Observation
I was worried that JSON('{ "0": "a", "0":"a" }') would give an error, but it seems like the duplicate key is ignored. This could be an interesting way to detect duplicates in the future by building a dictionary.