Sterling has too many projects Blogging about programming, microcontrollers & electronics, 3D printing, and whatever else...

Breaking Down Async Problems

| 2351 words | 12 minutes | raku advent-2019
A cat playing with the faucet

One challenge I’ve often faced when writing asynchronous code is trying to figure out how to break the problem down in a reasonable way. How far do I go in breaking up my code? How many steps do I want to take? How do I deal with tasks that branch or fork? How do I deal with the interdependencies I’ve created. My hope in this article is to give some guidelines I’ve learned and some tools for answering these questions.

This advent calendar post will be focused on async problems. Later in the calendar, I come back and consider the same with a focus on concurrency.

The Special Character of Async Problems

This problem is different in character, but not different in real substance when compared to more traditional programming problems. Just as you do when you write software that needs to be broken up into pieces using functions, methods, and subroutines, you largely do the same with async.

So what is the special “character” that sets these async problems apart. Well, what makes an async program async? It is the separation between call and result. You initiate work and your code that works with the results works whenever two factors are true:

  1. You are ready to handle the work and
  2. The result is available.

Therefore, the special character of async problems is that you want to the work in such a way as to make sure both of those conditions are true as often as possible so that your code is ready to process the work for processing a result at the moment the result is ready to be processed.

That is the only special consideration you need when looking at breaking down async problems. Otherwise, your programming problems are all typical programming problems. That said, let’s now consider some practical considerations for various types of async coding methods in Raku.

Use react Blocks

My first piece of practical advice is to always use a [reactp(https://docs.raku.org/language/concurrency#index-entry-react) block whenever you need to gather your work together. A react block is the perfect place to coordinate multiple asynchronous processes that work together.

As an example, I use a Raku program to render this web site statically. The tool I developed for doing this has a mode in it called build-loop which watches for changes to files and rebuilds the site when those changes occur. In production, it monitors a socket which gets pinged whenever the sync tool detects a change to the master git repo. In development, it uses IO::Notification to watch for changes on disk and also runs a micro-web server so I can serve those files in a way that emulates the deployed system.

It has a master react block which looks something like this:

react {
    my $needs-rebuild = True;
    with $notify-port {
        whenever IO::Socket::Async.listen('127.0.0.1', $notify-port) -> $conn {
            # manage a connection to set $needs-rebuild on ping
        }
    }

    with $server-port {
        whenever IO::Socket::Async.listen('127.0.0.1', $server-port) -> $conn {
            # micro-web server for developer mode here
        }
    }

    whenever Supply.interval($interval, :$delay) {
        if $needs-rebuild {
            $needs-trigger--;
            build-site();
        }

        once {
            # configure IO::Notification to set $needs-rebuild on change
        }
    }
}

I’ve left out a lot of the details, but this should give you a feel for it. I am able to coordinate different means by which I can discover changes that cause a site rebuild. I have a tool for keeping the number of rebuilds to only as often as $interval seconds so groups of changes don’t re-trigger builds endlessly. I can simultaneously run a small web server for serving content in developer mode. And I’m doing all of this in the same event loop using a single thread.

The nice thing is that for every whenever within a react, we can share variables and state without ever having to worry about thread safety. The blocks may or may not be running on the same thread, but in any case, Raku guarantees they won’t run concurrently.

Therefore, a react block is perfect for coordinating various tasks together. Almost every async program I write has a main loop like this in it somewhere. If tasks are strongly independent, I may have one event loop for each group of tasks, with the react block running within a start. For example, I could have event loop for handling graphics updates and another for running a networking back-end.

Prefer Pipelines

Whenever you break down your problem you will often have a choice of creating a Supplier object and feeding data into it or pipelining. If you can pipeline, then you should pipeline. The simplest example of a pipeline is the .map method on Supply:

my $original = Supply.interval(1);
my Supply $plus-one = $original.map(* + 1);

Doing this vastly simplifies your processing. It clearly demonstrates the dependency one task has on the one before it. It is easy to read and follow. It will save you many headaches.

See the documentation for Supply for other similar mapping functions that are built in. One of my favorites is .lines for turning a Supply that emits strings into a list of strings broken up by newlines.

The Cro services platform formalizes this idea of a platform into transformations. Almost the entire system is one pipeline from request to response where each step of the way transforms input one step close to the eventual output. This is a very robust means of handling async processing.

Make Anything a Supply

If you are constructing a list, you can make that list using a supply block. This works best with non-trivial processes or when you have the need to reuse the Supply frequently.

my $primes = supply {
    for 1...* -> $n {
        emit $n if $n.is-prime;
    }
}

In the case of a trivial bit of processing like the supply block above and in a situation where you only need to tap it once, the simpler method may be to turn a Seq into a Supply by calling .Supply on the sequence:

my $primes = (1...*).grep(*.is-prime).Supply;

That latter example is functionally equivalent to the first and, in my opinion, much easier to read and follow. However, keep supply ready for when you need to generate a reusable Supply or one based on non-trivial logic.

Use Supplier for Splits and Joins

When you have a set of objects coming in that require different processing, you can insert an if statement to handle each case here or you can re-emit those items to be processed in separate streams. If the processing is non-trivial, consider using a separate Supplier object for each type of processing. Then, use one more Supplier to join the streams back together if necessary:

This is similar to deciding whether to use a separate subroutine or not for a given problem with multiple solutions. Instead of separate subs, you can use separate whenevers.

Consider this problem where we have a combined log and we want to treat error objects differently from access object:

react {
    my Supplier $emitter .= new;
    my Supplier $error .= new;
    my Supplier $access .= new;

    whenever $emitter.Supply { .say }
    whenever $error.Supply -> %e {
        $emitted.emit: "%e<timestamp> Error %e<code>: %e<message>";
    }
    whenever $access.Supply -> %a {
        $emitted.emit: "%a<timestamp> Access: %a<url>";
    }
    whenever $log.Supply.lines -> $line {
        given $line.&from-json {
            when so .<type> eq 'error' { $error.emit: $_ }
            when so .<type> eq 'access' { $access.emit: $_ }
            default { die "invalid log type" }
        }
    }
}

The reason splitting and joining can be better is that it can be a little easier to read and follow as each whenever is focused on a single task. In a case where one branch involves a longer process and the other branch involves a shorter process, it also allows you to consider how to best optimize each task separately.

On Demand vs. Live Supplies

You should be aware of the difference between kinds of supplies. The differences are somewhat subtle and can be used somewhat interchangeably.

  1. A live supply is created using the Supplier class. There is a single stream of events that are received by the current taps on the associated Supply object. If there are no taps, the events are not processed. If there are N taps, the .emit method of Supplier blocks until every tap has finished processing that event.

  2. An on-demand supply is created using a supply block or by calling the .Supply method on a list. Each tap of the Supply is effectively a separate process, receiving all the items generated by that supply object from start to end. The code that generates each item in the supply is run and again, the emit in the supply blocks until the single tap completes.

Essentially, a live supply uses a fan-out architecture while on-demand is really just a variation of Seq in how it behaves. I think of on-demand supplies as being just that, an adapter to make functions that return sequences work with whenever blocks.

Avoid Supplier::Preserving

Then, there is Supplier::Preserving. Some think of this as a middle ground between the two types. However, the semantics of this object are identical to a live supply, but with one exception: the object buffers events emitted when there are no taps and immediately dumps those objects into the first tap that comes along.

Therefore, it is primarily a convenience in cases where it can be difficult to initialize your taps before you begin emitting. For example:

my Supplier::Preserving $msg .= new;
$msg.emit($_) for ^10;
$msg.Supply.tap: { .say };

Even though the tap happens after emitting to $msg, the program will print the numbers 1 through 10.

The problem is that Supplier::Preserving has risks associated with it, such as ballooning memory or long iterations over old data when first tapped. Instead, you should prefer to use Supplier and make sure all of your taps are in place before emitting.

my Supplier $msg .new new;
$msg.Supply.tap: { .say }
$msg.emit($_) for ^10;

Or just be able to miss a few at the start. In some cases, you might actually want to use a Channel instead.

There are cases where Supplier::Preserving is handy, so make it use of it as needed. I’ve just found it to be an easy crutch for proper bootstrapping when I’m being lazy, but in most cases it annoys me as time goes on.

Break up Long Running whenever Blocks

What is reasonable for your task may vary, but remember that code running inside of a react block one whenever block will prevent all others from running. A react block is really just a thin veneer on the old fashioned event loop where any sub-task can starve the others of processing time.

For example, consider the react block I mention for the build-loop tool above. When the build-site() routine runs, my web server cannot refresh. Is that okay?

  1. It is a development process so I can tolerate some oddity in how the web server runs.
  2. I’m the only developer.
  3. It means that my web site waits until the site finishes building to refresh.
  4. I’d prefer to wait and only see fresh content.

Sounds like a win for me.

In production, I wouldn’t tolerate it. Old content now is almost always better than the freshest content if it is going to take more than a few milliseconds to build when it comes to web content. In that case, I would setup a separate web server thread. In this particular case, there’s no application server at all, just static content, so that’s not necessary.

That’s the sort of trade-off you have to decide when designing your whenever blocks. If a whenever block runs long, the other blocks are put off. If that’s a bad thing, break that whenever up into a series of smaller whenever blocks which are chained together. Each time you finish a whenever block in some long running process is an opportunity for a potentially starving task to take it’s turn.

If a task like this is still a problematic, you might need to move it to another thread via a start block.

Batch Short Tasks

Alternately, trivial tasks involve a certain amount of overhead for the react block to switch between. If a task is super fast, you might want to consider using the .batch method on Supply to allow you to loop over groups of elements to avoid switching tasks as often. The .batch method is handy because it will let you break up a problem on both a time delay and a number of elements. This will let your program spend more time doing real work and less time doing the busy work of deciding which task to schedule next.

Avoid Sleep

If you are in a react block, you do not want to call sleep unless your purpose is to block all execution on the current thread. Otherwise, you should prefer using an await to pause your task. If you do this, your react block can continue handling events until the await completes. If you need to add an await for a number of seconds, you can do this:

await Promise.in(10); # sleep 10 seconds

Beware of Deadlocks

Even though Raku’s interfaces are composable, it is still possible to end up with deadlocks if you use them inappropriately. Anything inside a react block is guaranteed to run in a sequential manner. This means that if you expect two whenever blocks to be able to run simultaneously, you will be disappointed when the code stops abruptly. I mention this because I run into this problem from time to time. Even though I know a react block enforces that single thread at a time rule, I still manage to imagine that multiple whenever blocks could run at the same time every now and then.

If you really need that, it is easy to fix. Just put a start block inside a whenever block and you can have two pieces of code running simultaneously.

Conclusion

That does it for now. In a few days, we will take up this conversation again, but instead async, we will consider the guidelines for divvying up work for concurrent processing.

Cheers.

The content of this site is licensed under Attribution 4.0 International (CC BY 4.0).