Sterling has too many projects

Blogging about Raku programming, microcontrollers & electronics, 3D printing, and whatever else...

Thread Safe Structures Without Locking

As I have said several times before in this calendar, it is always best to avoid sharing state between running threads. Again, however, here is yet another way to share state, when you need to do it.

A few days ago, we considered monitors as a mechanism for creating a thread-safe object. Let’s consider the following monitor:

class BankBalanceMonitor {
    has UInt $.balance = 1000;
    has Lock $!lock .= new;

    method deposit(UInt:D $amount) {
        $!lock.protect: { $!balance += $amount };
    }

    method withdraw(UInt:D $amount) {
        $!lock.protect: { $!balance -= $amount };
    }
}

The day after that we considered the compare-and-swap operation, a.k.a. cas, and how to use it with any scalar variable in Raku. By using cas, we can actually create thread safe objects without using locks at all.

Thus, we can rewrite the above class as a lock-free data structure like this:

class BankBalanceLockFree {
    has UInt $.balance = 1000;

    method deposit(UInt:D $amount) {
        cas $!balance, -> $current { $current + $amount };
    }

    method withdraw(UInt:D $amount) {
        cas $!balance, -> $current { $current - $amount };
    }
}

That’s it. Same protections, but now we’ve made use of the scalar CAS operation instead. This can be more efficiently than locking. But why?

Locks have a cost at the beginning and end every time the lock is encountered. Add to this the fact that the every critical section is a bottle neck where a multi-threaded system must become single-threaded for a moment. Whereas CAS has no particularly expensive operations, but might cause the critical section to re-run multiple times.

Let’s consider the extremes of two variables in our system: contention and run time. Contention is a generic term describing the number of threads needing to work in the critical section at once. Run time here describes how long it takes to run the operation inside the critical section.

If an operation has low contention and short run time, CAS is almost certain to perform better. Locks have high overhead at start and end, whereas CAS is going to have almost no overhead. With low contention we might have to repeat an operation every now and then, but the operation is fast, so it doesn’t matter.

If an operation has high contention and short run time, CAS is still likely to win. You could end up with a thread or two having to repeat the operation several times, but with a larger number of threads a lock’s enforcement of a single-threaded bottleneck does not scale well.

If an operation has low contention and long run time, CAS might be a loser. If the critical section really must take hundreds of milliseconds or even longer, repeats are likely to be more costly. It may be worth A/B testing to see which wins, though.

If an operation has high contention and long run time, locks may win. At this point, however, it becomes less clear if your operation is really scalable across multiple threads at all. The bottleneck of locking for a long time on many competing threads essentially reduces your application to single-threaded. It might be time to consider how you can speed up the operation or do it in a way that doesn’t involve shared state.

Cheers.