Breaking Down Async Problems
 
        
        
    One challenge I’ve often faced when writing asynchronous code is trying to figure out how to break the problem down in a reasonable way. How far do I go in breaking up my code? How many steps do I want to take? How do I deal with tasks that branch or fork? How do I deal with the interdependencies I’ve created. My hope in this article is to give some guidelines I’ve learned and some tools for answering these questions.
This advent calendar post will be focused on async problems. Later in the calendar, I come back and consider the same with a focus on concurrency.
The Special Character of Async Problems
This problem is different in character, but not different in real substance when compared to more traditional programming problems. Just as you do when you write software that needs to be broken up into pieces using functions, methods, and subroutines, you largely do the same with async.
So what is the special “character” that sets these async problems apart. Well, what makes an async program async? It is the separation between call and result. You initiate work and your code that works with the results works whenever two factors are true:
- You are ready to handle the work and
- The result is available.
Therefore, the special character of async problems is that you want to the work in such a way as to make sure both of those conditions are true as often as possible so that your code is ready to process the work for processing a result at the moment the result is ready to be processed.
That is the only special consideration you need when looking at breaking down async problems. Otherwise, your programming problems are all typical programming problems. That said, let’s now consider some practical considerations for various types of async coding methods in Raku.
Use react Blocks
My first piece of practical advice is to always use a
[reactp(https://docs.raku.org/language/concurrency#index-entry-react) block
whenever you need to gather your work together. A react block is the perfect
place to coordinate multiple asynchronous processes that work together.
As an example, I use a Raku program to render this web site statically. The tool
I developed for doing this has a mode in it called build-loop which watches
for changes to files and rebuilds the site when those changes occur. In
production, it monitors a socket which gets pinged whenever the sync tool
detects a change to the master git repo. In development, it uses
IO::Notification to watch for
changes on disk and also runs a micro-web server so I can serve those files in a
way that emulates the deployed system.
It has a master react block which looks something like this:
react {
    my $needs-rebuild = True;
    with $notify-port {
        whenever IO::Socket::Async.listen('127.0.0.1', $notify-port) -> $conn {
            # manage a connection to set $needs-rebuild on ping
        }
    }
    with $server-port {
        whenever IO::Socket::Async.listen('127.0.0.1', $server-port) -> $conn {
            # micro-web server for developer mode here
        }
    }
    whenever Supply.interval($interval, :$delay) {
        if $needs-rebuild {
            $needs-trigger--;
            build-site();
        }
        once {
            # configure IO::Notification to set $needs-rebuild on change
        }
    }
}
I’ve left out a lot of the details, but this should give you a feel for it. I am
able to coordinate different means by which I can discover changes that cause a
site rebuild. I have a tool for keeping the number of rebuilds to only as often
as $interval seconds so groups of changes don’t re-trigger builds endlessly. I
can simultaneously run a small web server for serving content in developer mode.
And I’m doing all of this in the same event loop using a single thread.
The nice thing is that for every whenever within a react, we can share
variables and state without ever having to worry about thread safety. The blocks
may or may not be running on the same thread, but in any case, Raku guarantees
they won’t run concurrently.
Therefore, a react block is perfect for coordinating various tasks together.
Almost every async program I write has a main loop like this in it somewhere. If
tasks are strongly independent, I may have one event loop for each group of
tasks, with the react block running within a
start.
For example, I could have event loop for handling graphics updates and another
for running a networking back-end.
Prefer Pipelines
Whenever you break down your problem you will often have a choice of creating a
Supplier object and feeding data into
it or pipelining. If you can pipeline, then you should pipeline. The simplest
example of a pipeline is the
.map method on
Supply:
my $original = Supply.interval(1);
my Supply $plus-one = $original.map(* + 1);
Doing this vastly simplifies your processing. It clearly demonstrates the dependency one task has on the one before it. It is easy to read and follow. It will save you many headaches.
See the documentation for Supply for other similar mapping functions that are
built in. One of my favorites is
.lines for turning a
Supply that emits strings into a list of strings broken up by newlines.
The Cro services platform formalizes this idea of a platform into transformations. Almost the entire system is one pipeline from request to response where each step of the way transforms input one step close to the eventual output. This is a very robust means of handling async processing.
Make Anything a Supply
If you are constructing a list, you can make that list using a
supply
block.  This works best with non-trivial processes or when you have the need to
reuse the Supply frequently.
my $primes = supply {
    for 1...* -> $n {
        emit $n if $n.is-prime;
    }
}
In the case of a trivial bit of processing like the supply block above and in
a situation where you only need to tap it once, the simpler method may be to
turn a Seq into a Supply by calling
.Supply on the sequence:
my $primes = (1...*).grep(*.is-prime).Supply;
That latter example is functionally equivalent to the first and, in my opinion,
much easier to read and follow. However, keep supply ready for when you need
to generate a reusable Supply or one based on non-trivial logic.
Use Supplier for Splits and Joins
When you have a set of objects coming in that require different processing, you
can insert an if statement to handle each case here or you can re-emit those
items to be processed in separate streams. If the processing is non-trivial,
consider using a separate Supplier object for each type of processing. Then,
use one more Supplier to join the streams back together if necessary:
This is similar to deciding whether to use a separate subroutine or not for a
given problem with multiple solutions. Instead of separate subs, you can use
separate whenevers.
Consider this problem where we have a combined log and we want to treat error objects differently from access object:
react {
    my Supplier $emitter .= new;
    my Supplier $error .= new;
    my Supplier $access .= new;
    whenever $emitter.Supply { .say }
    whenever $error.Supply -> %e {
        $emitted.emit: "%e<timestamp> Error %e<code>: %e<message>";
    }
    whenever $access.Supply -> %a {
        $emitted.emit: "%a<timestamp> Access: %a<url>";
    }
    whenever $log.Supply.lines -> $line {
        given $line.&from-json {
            when so .<type> eq 'error' { $error.emit: $_ }
            when so .<type> eq 'access' { $access.emit: $_ }
            default { die "invalid log type" }
        }
    }
}
The reason splitting and joining can be better is that it can be a little easier
to read and follow as each whenever is focused on a single task. In a case
where one branch involves a longer process and the other branch involves a
shorter process, it also allows you to consider how to best optimize each task
separately.
On Demand vs. Live Supplies
You should be aware of the difference between kinds of supplies. The differences are somewhat subtle and can be used somewhat interchangeably.
- 
A live supply is created using the Supplierclass. There is a single stream of events that are received by the current taps on the associatedSupplyobject. If there are no taps, the events are not processed. If there are N taps, the.emitmethod ofSupplierblocks until every tap has finished processing that event.
- 
An on-demand supply is created using a supplyblock or by calling the.Supplymethod on a list. Each tap of theSupplyis effectively a separate process, receiving all the items generated by that supply object from start to end. The code that generates each item in the supply is run and again, theemitin thesupplyblocks until the single tap completes.
Essentially, a live supply uses a fan-out architecture while on-demand is really
just a variation of Seq in how it behaves. I think of on-demand supplies as
being just that, an adapter to make functions that return sequences work with
whenever blocks.
Avoid Supplier::Preserving
Then, there is
Supplier::Preserving. Some
think of this as a middle ground between the two types. However, the semantics
of this object are identical to a live supply, but with one exception: the
object buffers events emitted when there are no taps and immediately dumps those
objects into the first tap that comes along.
Therefore, it is primarily a convenience in cases where it can be difficult to initialize your taps before you begin emitting. For example:
my Supplier::Preserving $msg .= new;
$msg.emit($_) for ^10;
$msg.Supply.tap: { .say };
Even though the tap happens after emitting to $msg, the program will print the
numbers 1 through 10.
The problem is that Supplier::Preserving has risks associated with it, such as
ballooning memory or long iterations over old data when first tapped.  Instead,
you should prefer to use Supplier and make sure all of your taps are in place
before emitting.
my Supplier $msg .new new;
$msg.Supply.tap: { .say }
$msg.emit($_) for ^10;
Or just be able to miss a few at the start. In some cases, you might
actually want to use a Channel instead.
There are cases where Supplier::Preserving is handy, so make it use of it as
needed. I’ve just found it to be an easy crutch for proper bootstrapping when
I’m being lazy, but in most cases it annoys me as time goes on.
Break up Long Running whenever Blocks
What is reasonable for your task may vary, but remember that code running inside
of a react block one whenever block will prevent all others from running. A
react block is really just a thin veneer on the old fashioned event loop where
any sub-task can starve the others of processing time.
For example, consider the react block I mention for the build-loop tool
above. When the build-site() routine runs, my web server cannot refresh. Is
that okay?
- It is a development process so I can tolerate some oddity in how the web server runs.
- I’m the only developer.
- It means that my web site waits until the site finishes building to refresh.
- I’d prefer to wait and only see fresh content.
Sounds like a win for me.
In production, I wouldn’t tolerate it. Old content now is almost always better than the freshest content if it is going to take more than a few milliseconds to build when it comes to web content. In that case, I would setup a separate web server thread. In this particular case, there’s no application server at all, just static content, so that’s not necessary.
That’s the sort of trade-off you have to decide when designing your whenever
blocks. If a whenever block runs long, the other blocks are put off. If that’s
a bad thing, break that whenever up into a series of smaller whenever blocks
which are chained together. Each time you finish a whenever block in some long
running process is an opportunity for a potentially starving task to take it’s
turn.
If a task like this is still a problematic, you might need to move it to another
thread via a start block.
Batch Short Tasks
Alternately, trivial tasks involve a certain amount of overhead for the react
block to switch between. If a task is super fast, you might want to consider
using the .batch method on
Supply to allow you to loop over groups of elements to avoid switching tasks
as often.  The .batch method is handy because it will let you break up a
problem on both a time delay and a number of elements.  This will let your
program spend more time doing real work and less time doing the busy work of
deciding which task to schedule next.
Avoid Sleep
If you are in a react block, you do not want to call
sleep unless your purpose is to
block all execution on the current thread. Otherwise, you should prefer using an
await to pause your task. If you do
this, your react block can continue handling events until the await
completes. If you need to add an await for a number of seconds, you can do
this:
await Promise.in(10); # sleep 10 seconds
Beware of Deadlocks
Even though Raku’s interfaces are composable, it is still possible to end up
with deadlocks if you use them inappropriately. Anything inside a react block
is guaranteed to run in a sequential manner. This means that if you expect two
whenever blocks to be able to run simultaneously, you will be disappointed
when the code stops abruptly. I mention this because I run into this problem
from time to time. Even though I know a react block enforces that single thread
at a time rule, I still manage to imagine that multiple whenever blocks could
run at the same time every now and then.
If you really need that, it is easy to fix. Just put a start block inside a
whenever block and you can have two pieces of code running simultaneously.
Conclusion
That does it for now. In a few days, we will take up this conversation again, but instead async, we will consider the guidelines for divvying up work for concurrent processing.
Cheers.
