Parallel Loop Execution
Iteration is slow. If you have N things to process in a loop, your loop will take N iterations to process. Slow. Sometimes that’s the only way, though, to solve a problem.
For example, let’s consider the case where we have a JSON log and we want a command to read each line, parse the JSON for that log, and summarize it showing the time stamp and message:
use JSON::Fast;
my $log-file = 'myapp.log'.IO;
for $log-file.lines -> $line {
my %data = from-json($line);
say "%data<timestamp> %data<message>";
}
If you have multiple cores on your system (and who doesn’t in 2019?), you can actually speed this up a little bit with a small change:
use JSON::Fast;
my $log-file = 'myapp.log'.IO;
race for $log-file.lines -> $line {
my %data = from-json($line);
say "%data<timestamp> %data<message>";
}
The
race
prefix added to any loop will result in the items being iterated
as quickly as possible on the available cores. On my machine, for a short 10,000
line log with only these two fields in it results in about a 25% time savings.
However, this comes with a consequence: the original order of the lines is no
longer preserved. In some cases, this might not matter, but in others it does
matter.
Now, there is another prefix we could use that preserves order, called hyper
.
However, in this particular case, it won’t work. Why? Because hyper
only
guarantees the results will be output in order, but here we are outputting the
results as the code is run. This is something to be very careful of whenever
working with these keywords.
However, this is easy to fix. You just need to eliminate the side-effects and make your for loop functional:
use JSON::Fast;
my $log-file = 'myapp.log'.IO;
my $output-lines = hyper for $log-file.lines -> $line {
my %data = from-json($line);
"%data<timestamp> %data<message>";
}
.say for @$output-lines;
Now, we get most of the speedup from parallelizing the parsing of JSON lines,
but we can output in the same order as the original file. This works because the
output of a for loop with a hyper
or race
prefix works just like do
: the
result is a sequence that we can iterate. In this case, it’s a HyperSeq
which
makes sure Raku handles the multi-threading bits correctly.
Cheers.