Sterling has too many projects Blogging about programming, microcontrollers & electronics, 3D printing, and whatever else...

Parallel Loop Execution

| 379 words | 2 minutes | raku advent-2019
Lines of light drawn in the air

Iteration is slow. If you have N things to process in a loop, your loop will take N iterations to process. Slow. Sometimes that’s the only way, though, to solve a problem.

For example, let’s consider the case where we have a JSON log and we want a command to read each line, parse the JSON for that log, and summarize it showing the time stamp and message:

use JSON::Fast;
my $log-file = 'myapp.log'.IO;
for $log-file.lines -> $line {
  my %data = from-json($line);
  say "%data<timestamp> %data<message>";
}

If you have multiple cores on your system (and who doesn’t in 2019?), you can actually speed this up a little bit with a small change:

use JSON::Fast;
my $log-file = 'myapp.log'.IO;
race for $log-file.lines -> $line {
  my %data = from-json($line);
  say "%data<timestamp> %data<message>";
}

The race prefix added to any loop will result in the items being iterated as quickly as possible on the available cores. On my machine, for a short 10,000 line log with only these two fields in it results in about a 25% time savings. However, this comes with a consequence: the original order of the lines is no longer preserved. In some cases, this might not matter, but in others it does matter.

Now, there is another prefix we could use that preserves order, called hyper. However, in this particular case, it won’t work. Why? Because hyper only guarantees the results will be output in order, but here we are outputting the results as the code is run. This is something to be very careful of whenever working with these keywords.

However, this is easy to fix. You just need to eliminate the side-effects and make your for loop functional:

use JSON::Fast;
my $log-file = 'myapp.log'.IO;
my $output-lines = hyper for $log-file.lines -> $line {
  my %data = from-json($line);
  "%data<timestamp> %data<message>";
}
.say for @$output-lines;

Now, we get most of the speedup from parallelizing the parsing of JSON lines, but we can output in the same order as the original file. This works because the output of a for loop with a hyper or race prefix works just like do: the result is a sequence that we can iterate. In this case, it’s a HyperSeq which makes sure Raku handles the multi-threading bits correctly.

Cheers.

The content of this site is licensed under Attribution 4.0 International (CC BY 4.0).