The Divide and Conquer Pattern
Let’s consider now how to solve a big problem with concurrency. If you have an algorithmic problem with lots of data that needs to be processed, you want to maximize the amount of load you can place on the available CPU cores to process it as quickly as possible. To demonstrate how we can go about this in Raku, we will consider Conway’s Game of Life, played on an effectively infinite game board.^{1}
Let’s start by defining Conway’s Game of Life, just in case you haven’t run across it before. The Game of Life is a simulation invented by British mathematician John Conway. It is played on a simple grid where each square is called a cell. Each cell may be either alive or dead during a given turn. Each cell has 8 neighbors, which are the cells directly above, below, left, right, and each of the 4 diagonals. To determine the state of a cell for the next turn, you perform the following checks using the state of the current cell and its neighbors from the current turn using these rules:
 Any live cell with fewer than two live neighbors dies.
 Any live cell with two or three neighbors lives on.
 Any live cell with more than three neighbors dies.
 Any dead cell with three neighbors comes to life.
If you want more details about this, you should check out the Wikipedia article on Conway’s Game of Life.
I have implemented Conway’s Game of Life in a nice long program with graphics (or text) output and all. However, it’s about 400 lines of code, so I’m not going to include all of it here. You can check out the ongoing evolution of this project on my github.
The simulator has a role named Game::Life::Player
which defines the
requirements for the player object. This object is responsible for performing
the rules of the game. Specifically, the .nextturnfor
method is given an
immutable copy of the current game board, a set of bounds, and a mutable copy of
the next game board to write to. It is responsible for turning the current board
into the board for the next turn according to the rules just mentioned.
Here is a copy of that from the Game::Life::Player::Basic
implementation,
which is basically the simplest way to go about this:
role Game::Life::Player {
...
method nextturnforcell(
Int:D $x,
Int:D $y,
Board:D $current,
Board:D $next,
) {
# Is the cell currently live?
my $live = $current.cell($x, $y);
# How many live neighbors does it currently have?
my $neighbors = [+] $current.neighbors($x, $y);
# If alive and has too many or too few neighbors, die.
if $live && !(2 <= $neighbors <= 3) {
return $next.kill($x, $y);
}
# if dead and has the right number of neighbors, come to life.
elsif !$live && $neighbors == 3 {
return $next.raise($x, $y);
}
else {
return Nil;
}
}
}
class Game::Life::Player::Basic does Game::Life::Player {
...
method nextturnfor(
Int:D $l,
Int:D $t,
Int:D $r,
Int:D $b,
Board:D $current,
Board:D $next,
) {
for $l..$r > $x {
for $t..$b > $y {
self.nextturnforcell($x, $y, $current, $next);
}
}
}
}
The implementation is simply iterating through every cell within the boundary
and running .nextturnforcell
on it. This method is implemented in the
.role
and just implements the rules as they apply to a single cell.
Easy.^{2}
Iterating through this in a single chunk is going to take a long time even for relatively small playing fields. To improve this situation, we can divide the work up into reasonable sized chunks and process each in a separate task. With multiple threads, we should be able to cut the time required to do the work down by up to a factor of N, where N is the number of cores available for computation. In reality, you will get somewhat less than that, but we should definitely be able to improve speed this way.
How might we do it? Here’s one possible solution that makes sure we never process chunks that are larger than 20 by 20, making for about 400 computations in a row. Gaining maximum efficiency requires some tuning, so a given system might do better with different numbers, but you get the idea.
Here’s an implementation of parallelnextturnfor
that’s part of the
Game::Life::Player::DivideAndConquer
player class:
class Game::Life::Player::DivideAndConquer is Game::Life::Player::Basic {
...
method parallelnextturnfor(
Int:D $l,
Int:D $t,
Int:D $r,
Int:D $b,
Board:D $current,
Board:D $next,
) {
my @jobs = gather {
if $r  $l > 20 {
my $m = ceiling($l + ($r  $l)/2);
#dd $l, $m, $r;
take start self.parallelnextturnfor($l, $t, $m  1, $b, $current, $next);
take start self.parallelnextturnfor($m, $t, $r, $b, $current, $next);
}
elsif $b  $t > 20 {
my $m = ceiling($t + ($b  $t)/2);
#dd $t, $m, $b;
take start self.parallelnextturnfor($l, $t, $r, $m  1, $current, $next);
take start self.parallelnextturnfor($l, $m, $r, $b, $current, $next);
}
else {
take start self.nextturnfor($l, $t, $r, $b, $current, $next);
}
}
await Promise.allof(@jobs);
}
}
This takes the same input as before, but if there are too many columns to
process, we cut the work in half by columns. If the columns are reasonable, but
there are too many rows, we cut the work in half by rows. If the size is just
right, we use the nextturnfor
we inherited from Game::Life::Player::Basic
.
Whether we split into two tasks or just do the work for some section of cells,
we use start
blocks to schedule the work and then await
the outcome.
Subdividing this way means we create a hierarchy of tasks that could subdivide
and subdivide again. The Raku scheduler will then schedule the tasks to run as
threads come available.
On my 2015 Macbook Pro, the game runs through 200 turns of the Gosper glider gun in around 35 seconds using 100% CPU when running sequentially. The same program runs around 2025 seconds at close to 300% CPU when running in parallel. It would probably be higher if I didn’t also have the rendering task occasionally using CPU to redraw the graphics window. But what fun would that be?
So, that’s a concurrency pattern you can employ when you have multiple cores available and an algorithm that lends itself to being broken up into parts.
Cheers.

I am not at all claiming this is an efficient way overall to simulate Conway’s Game of Life, but I am demonstrating how to take one version of that simulator and making it more efficient with the use of concurrency. ↩︎

This is grossly inefficient as large sections of the board are very likely to be blank. As I already mentioned, though, my goal here is not to implement an efficient player, but to show how we can improve efficiency using concurrency. ↩︎