Parallel Map-Reduce Pattern

Map-reduce is a common way of solving problems in functional programming. You have a list of items, you iterate through them to process them, and then you take the set and summarize them. We call this map-reduce because the iteration step is mapping values to new values and the summarize step reduces the number of values.

In Raku, map-reduce is a common programming pattern:

my $fibonacci = (1, 1, * + * ... *);
my $double-sum = $fibonacci[^100].grep(*.is-prime).map(* * 2).reduce(* + *);

This creates a Seq with the sequence operator. This is the classic Fibonacci sequence. We then take the first 100 elements of Fibonacci, filter out any non-primes, double the primes, and the sum the values.

That’s a weird operation, but demonstrates the kind of tasks you would perform with the map-reduce pattern. We take a sequence of data (the first 100 elements of the Fibonacci sequence, in this case), we filter the data to keep only the prime numbers, we double the values, and then we add them together to get a final sum.

In this case, the answer is not a difficult calculation and probably instantaneous, but what if we needed to do that operation for the first 4,000 instead? That will likely take seconds on a typical system today. As the .grep and .map must iterate over each value to filter and transform, there’s no particular reason we have to do those operations sequentially for every value.¹

Raku provides tools that let you parallelize this task in different ways with only a small change. Consider this variation:

my $fibonacci = (1, 1, * + * ... *);
my $double-sum = $fibonacci[^100].race.grep(*.is-prime).map(* * 2).reduce(* + *);

By inserting the .race at the beginning, we tell Raku to perform the operation in parallel. It will split the task into parts, run the parts in separate tasks, which will be scheduled on separate threads. On my system, that operation runs 2-times to 3-times faster than the first.

Raku provides a couple different strategies for parallelizing map-reduce tasks.

The .race method breaks the operation up to perform the work in batches that are processed concurrently. It does nothing to guarantee that the order of the original items is preserved, though. The items are just returned as the threads finish.
The .hyper method is very similar to .race, but it guarantees that the order of the items from the original list is preserved. This means processing on later parts of the list will be held up if parts earlier in the list take longer. It is not quite as a efficient, but if you need to preserve the order, .hyper will guarantee that the order is preserved.

Each of these operations take a :batch and :degree parameter in case you want to customize the way the work is broken up and executed. Raku attempts to pick reasonable defaults, but tuning these for your particular setup will probably yield some improvements when you need to eek out even better performance.

The :degree option selects how many workers to start. For CPU bound work, it is generally best to pick a number equal to the number of CPU cores available. There’s no reason to go higher because you’ll never run more than that many jobs at once in such a case. However, if the work involves waiting on the disk or network, it’s very likely your code will be paused for milliseconds or longer waiting for IO. In those cases, it can be wise to increase :degree to several times the number of CPUs to account for the wait time.

The :batch option decides how to break the work up. A large number is useful when the work to be done is fast. This will keep your throughput high. A small number, even down to 1, is reasonable when the work is long or you want to get each result as soon as you can.

So, with that in mind, we could further tune the work above like this:

my $fibonacci = (1, 1, * + * ... *);
my $double-sum = $fibonacci[^4000].race(:batch(1000), :4degree).grep(*.is-prime).map(* * 2).reduce(* + *);

In this case, tuning doesn’t make a large difference on my 4-core laptop, but when there are more than 4 cores on your system, it is likely that tuning will help some.

So, any time you have a task where you need to iterate through items and operate on them and have CPU time to spare to speed them up, consider employing .hyper or .race in your code.

Cheers.

Raku does some amount of optimization already for this operation. For example, it does not perform a separate loop for .grep and .map. These operations will be chained together in a sequence so there’s essentially only a single iteration being performed. ↩︎

Parallel Map-Reduce Pattern

TOC

Pages

Tags

Links