Sterling has too many projects

Blogging about Raku programming, microcontrollers & electronics, 3D printing, and whatever else...

The Divide and Conquer Pattern

Let’s consider now how to solve a big problem with concurrency. If you have an algorithmic problem with lots of data that needs to be processed, you want to maximize the amount of load you can place on the available CPU cores to process it as quickly as possible. To demonstrate how we can go about this in Raku, we will consider Conway’s Game of Life, played on an effectively infinite game board.1

Let’s start by defining Conway’s Game of Life, just in case you haven’t run across it before. The Game of Life is a simulation invented by British mathematician John Conway. It is played on a simple grid where each square is called a cell. Each cell may be either alive or dead during a given turn. Each cell has 8 neighbors, which are the cells directly above, below, left, right, and each of the 4 diagonals. To determine the state of a cell for the next turn, you perform the following checks using the state of the current cell and its neighbors from the current turn using these rules:

  1. Any live cell with fewer than two live neighbors dies.
  2. Any live cell with two or three neighbors lives on.
  3. Any live cell with more than three neighbors dies.
  4. Any dead cell with three neighbors comes to life.

If you want more details about this, you should check out the Wikipedia article on Conway’s Game of Life.

I have implemented Conway’s Game of Life in a nice long program with graphics (or text) output and all. However, it’s about 400 lines of code, so I’m not going to include all of it here. You can check out the ongoing evolution of this project on my github.

The simulator has a role named Game::Life::Player which defines the requirements for the player object. This object is responsible for performing the rules of the game. Specifically, the .next-turn-for method is given an immutable copy of the current game board, a set of bounds, and a mutable copy of the next game board to write to. It is responsible for turning the current board into the board for the next turn according to the rules just mentioned.

Here is a copy of that from the Game::Life::Player::Basic implementation, which is basically the simplest way to go about this:

role Game::Life::Player {
    ...
    method next-turn-for-cell(
        Int:D $x,
        Int:D $y,
        Board:D $current,
        Board:D $next,
    ) {
        # Is the cell currently live?
        my $live      = $current.cell($x, $y);

        # How many live neighbors does it currently have?
        my $neighbors = [+] $current.neighbors($x, $y);

        # If alive and has too many or too few neighbors, die.
        if $live && !(2 <= $neighbors <= 3) {
            return $next.kill($x, $y);
        }

        # if dead and has the right number of neighbors, come to life.
        elsif !$live && $neighbors == 3 {
            return $next.raise($x, $y);
        }

        else {
            return Nil;
        }
    }
}

class Game::Life::Player::Basic does Game::Life::Player {
    ...
    method next-turn-for(
        Int:D $l,
        Int:D $t,
        Int:D $r,
        Int:D $b,
        Board:D $current,
        Board:D $next,
    ) {
        for $l..$r -> $x {
            for $t..$b -> $y {
                self.next-turn-for-cell($x, $y, $current, $next);
            }
        }
    }
}

The implementation is simply iterating through every cell within the boundary and running .next-turn-for-cell on it. This method is implemented in the .role and just implements the rules as they apply to a single cell. Easy.2

Iterating through this in a single chunk is going to take a long time even for relatively small playing fields. To improve this situation, we can divide the work up into reasonable sized chunks and process each in a separate task. With multiple threads, we should be able to cut the time required to do the work down by up to a factor of N, where N is the number of cores available for computation. In reality, you will get somewhat less than that, but we should definitely be able to improve speed this way.

How might we do it? Here’s one possible solution that makes sure we never process chunks that are larger than 20 by 20, making for about 400 computations in a row. Gaining maximum efficiency requires some tuning, so a given system might do better with different numbers, but you get the idea.

Here’s an implementation of parallel-next-turn-for that’s part of the Game::Life::Player::DivideAndConquer player class:

class Game::Life::Player::DivideAndConquer is Game::Life::Player::Basic {
    ...
    method parallel-next-turn-for(
        Int:D $l,
        Int:D $t,
        Int:D $r,
        Int:D $b,
        Board:D $current,
        Board:D $next,
    ) {
        my @jobs = gather {
            if $r - $l > 20 {
                my $m = ceiling($l + ($r - $l)/2);
                #dd $l, $m, $r;

                take start self.parallel-next-turn-for($l, $t, $m - 1, $b, $current, $next);
                take start self.parallel-next-turn-for($m, $t, $r, $b, $current, $next);
            }

            elsif $b - $t > 20 {
                my $m = ceiling($t + ($b - $t)/2);
                #dd $t, $m, $b;

                take start self.parallel-next-turn-for($l, $t, $r, $m - 1, $current, $next);
                take start self.parallel-next-turn-for($l, $m, $r, $b, $current, $next);
            }

            else {
                take start self.next-turn-for($l, $t, $r, $b, $current, $next);
            }
        }

        await Promise.allof(@jobs);
    }
}

This takes the same input as before, but if there are too many columns to process, we cut the work in half by columns. If the columns are reasonable, but there are too many rows, we cut the work in half by rows. If the size is just right, we use the next-turn-for we inherited from Game::Life::Player::Basic.

Whether we split into two tasks or just do the work for some section of cells, we use start blocks to schedule the work and then await the outcome. Subdividing this way means we create a hierarchy of tasks that could subdivide and subdivide again. The Raku scheduler will then schedule the tasks to run as threads come available.

On my 2015 Macbook Pro, the game runs through 200 turns of the Gosper glider gun in around 35 seconds using 100% CPU when running sequentially. The same program runs around 20–25 seconds at close to 300% CPU when running in parallel. It would probably be higher if I didn’t also have the rendering task occasionally using CPU to redraw the graphics window. But what fun would that be?

So, that’s a concurrency pattern you can employ when you have multiple cores available and an algorithm that lends itself to being broken up into parts.

Cheers.


  1. I am not at all claiming this is an efficient way overall to simulate Conway's Game of Life, but I am demonstrating how to take one version of that simulator and making it more efficient with the use of concurrency.  ↩

  2. This is grossly inefficient as large sections of the board are very likely to be blank. As I already mentioned, though, my goal here is not to implement an efficient player, but to show how we can improve efficiency using concurrency.  ↩