Breaking Down Async Problems
One challenge I’ve often faced when writing asynchronous code is trying to figure out how to break the problem down in a reasonable way. How far do I go in breaking up my code? How many steps do I want to take? How do I deal with tasks that branch or fork? How do I deal with the interdependencies I’ve created. My hope in this article is to give some guidelines I’ve learned and some tools for answering these questions.
This advent calendar post will be focused on async problems. Later in the calendar, I come back and consider the same with a focus on concurrency.
The Special Character of Async Problems
This problem is different in character, but not different in real substance when compared to more traditional programming problems. Just as you do when you write software that needs to be broken up into pieces using functions, methods, and subroutines, you largely do the same with async.
So what is the special “character” that sets these async problems apart. Well, what makes an async program async? It is the separation between call and result. You initiate work and your code that works with the results works whenever two factors are true:
- You are ready to handle the work and
- The result is available.
Therefore, the special character of async problems is that you want to the work in such a way as to make sure both of those conditions are true as often as possible so that your code is ready to process the work for processing a result at the moment the result is ready to be processed.
That is the only special consideration you need when looking at breaking down async problems. Otherwise, your programming problems are all typical programming problems. That said, let’s now consider some practical considerations for various types of async coding methods in Raku.
Use react
Blocks
My first piece of practical advice is to always use a
[react
p(https://docs.raku.org/language/concurrency#index-entry-react) block
whenever you need to gather your work together. A react
block is the perfect
place to coordinate multiple asynchronous processes that work together.
As an example, I use a Raku program to render this web site statically. The tool
I developed for doing this has a mode in it called build-loop
which watches
for changes to files and rebuilds the site when those changes occur. In
production, it monitors a socket which gets pinged whenever the sync tool
detects a change to the master git
repo. In development, it uses
IO::Notification
to watch for
changes on disk and also runs a micro-web server so I can serve those files in a
way that emulates the deployed system.
It has a master react
block which looks something like this:
react {
my $needs-rebuild = True;
with $notify-port {
whenever IO::Socket::Async.listen('127.0.0.1', $notify-port) -> $conn {
# manage a connection to set $needs-rebuild on ping
}
}
with $server-port {
whenever IO::Socket::Async.listen('127.0.0.1', $server-port) -> $conn {
# micro-web server for developer mode here
}
}
whenever Supply.interval($interval, :$delay) {
if $needs-rebuild {
$needs-trigger--;
build-site();
}
once {
# configure IO::Notification to set $needs-rebuild on change
}
}
}
I’ve left out a lot of the details, but this should give you a feel for it. I am
able to coordinate different means by which I can discover changes that cause a
site rebuild. I have a tool for keeping the number of rebuilds to only as often
as $interval
seconds so groups of changes don’t re-trigger builds endlessly. I
can simultaneously run a small web server for serving content in developer mode.
And I’m doing all of this in the same event loop using a single thread.
The nice thing is that for every whenever
within a react, we can share
variables and state without ever having to worry about thread safety. The blocks
may or may not be running on the same thread, but in any case, Raku guarantees
they won’t run concurrently.
Therefore, a react
block is perfect for coordinating various tasks together.
Almost every async program I write has a main loop like this in it somewhere. If
tasks are strongly independent, I may have one event loop for each group of
tasks, with the react
block running within a
start
.
For example, I could have event loop for handling graphics updates and another
for running a networking back-end.
Prefer Pipelines
Whenever you break down your problem you will often have a choice of creating a
Supplier
object and feeding data into
it or pipelining. If you can pipeline, then you should pipeline. The simplest
example of a pipeline is the
.map
method on
Supply
:
my $original = Supply.interval(1);
my Supply $plus-one = $original.map(* + 1);
Doing this vastly simplifies your processing. It clearly demonstrates the dependency one task has on the one before it. It is easy to read and follow. It will save you many headaches.
See the documentation for Supply
for other similar mapping functions that are
built in. One of my favorites is
.lines
for turning a
Supply
that emits strings into a list of strings broken up by newlines.
The Cro services platform formalizes this idea of a platform into transformations. Almost the entire system is one pipeline from request to response where each step of the way transforms input one step close to the eventual output. This is a very robust means of handling async processing.
Make Anything a Supply
If you are constructing a list, you can make that list using a
supply
block. This works best with non-trivial processes or when you have the need to
reuse the Supply
frequently.
my $primes = supply {
for 1...* -> $n {
emit $n if $n.is-prime;
}
}
In the case of a trivial bit of processing like the supply
block above and in
a situation where you only need to tap it once, the simpler method may be to
turn a Seq
into a Supply
by calling
.Supply
on the sequence:
my $primes = (1...*).grep(*.is-prime).Supply;
That latter example is functionally equivalent to the first and, in my opinion,
much easier to read and follow. However, keep supply
ready for when you need
to generate a reusable Supply
or one based on non-trivial logic.
Use Supplier for Splits and Joins
When you have a set of objects coming in that require different processing, you
can insert an if
statement to handle each case here or you can re-emit those
items to be processed in separate streams. If the processing is non-trivial,
consider using a separate Supplier
object for each type of processing. Then,
use one more Supplier
to join the streams back together if necessary:
This is similar to deciding whether to use a separate subroutine or not for a
given problem with multiple solutions. Instead of separate sub
s, you can use
separate whenever
s.
Consider this problem where we have a combined log and we want to treat error objects differently from access object:
react {
my Supplier $emitter .= new;
my Supplier $error .= new;
my Supplier $access .= new;
whenever $emitter.Supply { .say }
whenever $error.Supply -> %e {
$emitted.emit: "%e<timestamp> Error %e<code>: %e<message>";
}
whenever $access.Supply -> %a {
$emitted.emit: "%a<timestamp> Access: %a<url>";
}
whenever $log.Supply.lines -> $line {
given $line.&from-json {
when so .<type> eq 'error' { $error.emit: $_ }
when so .<type> eq 'access' { $access.emit: $_ }
default { die "invalid log type" }
}
}
}
The reason splitting and joining can be better is that it can be a little easier
to read and follow as each whenever
is focused on a single task. In a case
where one branch involves a longer process and the other branch involves a
shorter process, it also allows you to consider how to best optimize each task
separately.
On Demand vs. Live Supplies
You should be aware of the difference between kinds of supplies. The differences are somewhat subtle and can be used somewhat interchangeably.
-
A live supply is created using the
Supplier
class. There is a single stream of events that are received by the current taps on the associatedSupply
object. If there are no taps, the events are not processed. If there are N taps, the.emit
method ofSupplier
blocks until every tap has finished processing that event. -
An on-demand supply is created using a
supply
block or by calling the.Supply
method on a list. Each tap of theSupply
is effectively a separate process, receiving all the items generated by that supply object from start to end. The code that generates each item in the supply is run and again, theemit
in thesupply
blocks until the single tap completes.
Essentially, a live supply uses a fan-out architecture while on-demand is really
just a variation of Seq
in how it behaves. I think of on-demand supplies as
being just that, an adapter to make functions that return sequences work with
whenever
blocks.
Avoid Supplier::Preserving
Then, there is
Supplier::Preserving
. Some
think of this as a middle ground between the two types. However, the semantics
of this object are identical to a live supply, but with one exception: the
object buffers events emitted when there are no taps and immediately dumps those
objects into the first tap that comes along.
Therefore, it is primarily a convenience in cases where it can be difficult to initialize your taps before you begin emitting. For example:
my Supplier::Preserving $msg .= new;
$msg.emit($_) for ^10;
$msg.Supply.tap: { .say };
Even though the tap happens after emitting to $msg
, the program will print the
numbers 1 through 10.
The problem is that Supplier::Preserving
has risks associated with it, such as
ballooning memory or long iterations over old data when first tapped. Instead,
you should prefer to use Supplier
and make sure all of your taps are in place
before emitting.
my Supplier $msg .new new;
$msg.Supply.tap: { .say }
$msg.emit($_) for ^10;
Or just be able to miss a few at the start. In some cases, you might
actually want to use a Channel
instead.
There are cases where Supplier::Preserving
is handy, so make it use of it as
needed. I’ve just found it to be an easy crutch for proper bootstrapping when
I’m being lazy, but in most cases it annoys me as time goes on.
Break up Long Running whenever
Blocks
What is reasonable for your task may vary, but remember that code running inside
of a react
block one whenever
block will prevent all others from running. A
react
block is really just a thin veneer on the old fashioned event loop where
any sub-task can starve the others of processing time.
For example, consider the react
block I mention for the build-loop
tool
above. When the build-site()
routine runs, my web server cannot refresh. Is
that okay?
- It is a development process so I can tolerate some oddity in how the web server runs.
- I’m the only developer.
- It means that my web site waits until the site finishes building to refresh.
- I’d prefer to wait and only see fresh content.
Sounds like a win for me.
In production, I wouldn’t tolerate it. Old content now is almost always better than the freshest content if it is going to take more than a few milliseconds to build when it comes to web content. In that case, I would setup a separate web server thread. In this particular case, there’s no application server at all, just static content, so that’s not necessary.
That’s the sort of trade-off you have to decide when designing your whenever
blocks. If a whenever
block runs long, the other blocks are put off. If that’s
a bad thing, break that whenever
up into a series of smaller whenever
blocks
which are chained together. Each time you finish a whenever
block in some long
running process is an opportunity for a potentially starving task to take it’s
turn.
If a task like this is still a problematic, you might need to move it to another
thread via a start
block.
Batch Short Tasks
Alternately, trivial tasks involve a certain amount of overhead for the react
block to switch between. If a task is super fast, you might want to consider
using the .batch
method on
Supply
to allow you to loop over groups of elements to avoid switching tasks
as often. The .batch
method is handy because it will let you break up a
problem on both a time delay and a number of elements. This will let your
program spend more time doing real work and less time doing the busy work of
deciding which task to schedule next.
Avoid Sleep
If you are in a react
block, you do not want to call
sleep
unless your purpose is to
block all execution on the current thread. Otherwise, you should prefer using an
await
to pause your task. If you do
this, your react
block can continue handling events until the await
completes. If you need to add an await
for a number of seconds, you can do
this:
await Promise.in(10); # sleep 10 seconds
Beware of Deadlocks
Even though Raku’s interfaces are composable, it is still possible to end up
with deadlocks if you use them inappropriately. Anything inside a react
block
is guaranteed to run in a sequential manner. This means that if you expect two
whenever
blocks to be able to run simultaneously, you will be disappointed
when the code stops abruptly. I mention this because I run into this problem
from time to time. Even though I know a react
block enforces that single thread
at a time rule, I still manage to imagine that multiple whenever
blocks could
run at the same time every now and then.
If you really need that, it is easy to fix. Just put a start
block inside a
whenever
block and you can have two pieces of code running simultaneously.
Conclusion
That does it for now. In a few days, we will take up this conversation again, but instead async, we will consider the guidelines for divvying up work for concurrent processing.
Cheers.