Parallel Map-Reduce Pattern
Map-reduce is a common way of solving problems in functional programming. You have a list of items, you iterate through them to process them, and then you take the set and summarize them. We call this map-reduce because the iteration step is mapping values to new values and the summarize step reduces the number of values.
In Raku, map-reduce is a common programming pattern:
my $fibonacci = (1, 1, * + * ... *);
my $double-sum = $fibonacci[^100].grep(*.is-prime).map(* * 2).reduce(* + *);
This creates a Seq
with the sequence
operator. This is the classic Fibonacci sequence. We then take the first 100
elements of Fibonacci, filter out any non-primes, double the primes, and the sum
the values.
That’s a weird operation, but demonstrates the kind of tasks you would perform with the map-reduce pattern. We take a sequence of data (the first 100 elements of the Fibonacci sequence, in this case), we filter the data to keep only the prime numbers, we double the values, and then we add them together to get a final sum.
In this case, the answer is not a difficult calculation and probably
instantaneous, but what if we needed to do that operation for the first 4,000
instead? That will likely take seconds on a typical system today. As the .grep
and .map
must iterate over each value to filter and transform,
there’s no particular reason we have to do those operations sequentially for
every value.1
Raku provides tools that let you parallelize this task in different ways with only a small change. Consider this variation:
my $fibonacci = (1, 1, * + * ... *);
my $double-sum = $fibonacci[^100].race.grep(*.is-prime).map(* * 2).reduce(* + *);
By inserting the .race
at the beginning, we tell Raku to perform the operation
in parallel. It will split the task into parts, run the parts in separate tasks,
which will be scheduled on separate threads. On my system, that operation runs
2-times to 3-times faster than the first.
Raku provides a couple different strategies for parallelizing map-reduce tasks.
-
The
.race
method breaks the operation up to perform the work in batches that are processed concurrently. It does nothing to guarantee that the order of the original items is preserved, though. The items are just returned as the threads finish. -
The
.hyper
method is very similar to.race
, but it guarantees that the order of the items from the original list is preserved. This means processing on later parts of the list will be held up if parts earlier in the list take longer. It is not quite as a efficient, but if you need to preserve the order,.hyper
will guarantee that the order is preserved.
Each of these operations take a :batch
and :degree
parameter in case you
want to customize the way the work is broken up and executed. Raku attempts to
pick reasonable defaults, but tuning these for your particular setup will
probably yield some improvements when you need to eek out even better
performance.
The :degree
option selects how many workers to start. For CPU bound work, it
is generally best to pick a number equal to the number of CPU cores available.
There’s no reason to go higher because you’ll never run more than that many jobs
at once in such a case. However, if the work involves waiting on the disk or
network, it’s very likely your code will be paused for milliseconds or longer
waiting for IO. In those cases, it can be wise to increase :degree
to several
times the number of CPUs to account for the wait time.
The :batch
option decides how to break the work up. A large number is useful
when the work to be done is fast. This will keep your throughput high. A small
number, even down to 1, is reasonable when the work is long or you want to get
each result as soon as you can.
So, with that in mind, we could further tune the work above like this:
my $fibonacci = (1, 1, * + * ... *);
my $double-sum = $fibonacci[^4000].race(:batch(1000), :4degree).grep(*.is-prime).map(* * 2).reduce(* + *);
In this case, tuning doesn’t make a large difference on my 4-core laptop, but when there are more than 4 cores on your system, it is likely that tuning will help some.
So, any time you have a task where you need to iterate through items and operate
on them and have CPU time to spare to speed them up, consider employing .hyper
or .race
in your code.
Cheers.
-
Raku does some amount of optimization already for this operation. For example, it does not perform a separate loop for
.grep
and.map
. These operations will be chained together in a sequence so there’s essentially only a single iteration being performed. ↩︎