I'm waiting for the day when 'async' will be the default function type and 'await' will be the default type of function call.
If anywhere down the callstack a function needs to await something it has to become an async function. And this needs to be done to the whole callstack recursively. So over time more and more functions of every codebase turn into async functions.
Continuation passing style means passing along your continuation to your callee, and is the implementation of the async model. But normal calls do pass their continuation: the return address, pushed on the stack, is the continuation function, and the stack itself is the scope, and together they are a closure. Make the stack a first class value that can be switched, and you get to continuations - and that's what the runtime gets with green threads.
It means much harder interop because you can't have native frames on your stack. But you have the same problem with async.
Green threads is very different from CPS actually both from an implementation and semantics point of view. In a CPS model the continuation is first class and is available at every single call point. In practice this would require heap allocating every single function frame, which is expensive and and a huge problem for interoperability (a smart compiler can turn it back to a traditional stack if the continuation is only used in traditional ways)
With the normall call/return model, the implicit continuation in a normal call stack can only be accessed via return. Green threads do allow accessing the continuation at specific yield points (although most green thread implementations do not expose it to the user) but because the yield point are much fewer than every single call, a whole stack can be allocated and used in one go. There aren't in fact many issues with interoperability as foregin functions or even os calls can be accommodated on this stack.
Async is a tradeoff, basically the programmer is responsible of marking functions that are to be CPS transformed and that need their stack frame reified. This way no full stack is needed nor there are the performance and interoperability issues of full CPS. You end up with the blue/red functions issue though.
If you want call/cc (for repeated executions of the continuation) you need a first class stack. A CPS execution model can use green threads right up until you want that explicit continuation in hand and split the stack. In most implementations of CPS the continuation is not in fact available for the user to execute: stack-oriented programming is the mental model, until you pull out call/cc or something similar.
Note that you don't get the continuation in hand for the async-everywhere advocated by GP, only blocking operations get the continuation, and most of those are implemented by the runtime.
One thing I like about Go is that there is no dichotomy between sync and async. Every function call presents a sync interface, but under the hood all I/O is async. I really miss this in Python, where every library has an async twin, usually maintained by someone unafiliated with the sync library.
Async being a “default” function type doesn’t make much sense unless you’re doing io. If you’re just computing, you often can’t proceed without the result and want to proceed as soon as you have the result—i.e. blocking. Both are critical to getting good performance.
I’m curious if discerning which is the best type is possible to tool automatically with static or tracing analysis.
True! However, I believe rust has a separate, distinct yield/generator functionality coming that should address those needs separately. I think off-hand it’d be pretty easy to wrap them in a stream, or to join a stream into a generator, if you did want to compose them.
I think it should be possible to design an async system that is generically async, i.e. whether a given function with no special annotations or syntax is async (including all of the async-able calls that function makes transitively) is entirely determined by the top-level caller at compile time. The called function doesn't need any special syntax, and in fact doesn't even need to consider being async at all: the async-capability bubbles up based on what other async-able functions it calls. Only the very ends of the chain: at the bottom doing IO via system calls, or at the very top where request handlers are first dispatched (for example) should need to think about being asynchronous or synchronous at all; everything in-between is compiled for whatever the top wants.
To allow for separate compilation you need to compile each function twice though: a CPS version and 'classic' one. This can be wasteful. You also need annotations in the object file to describe the frame size of each CPS function. At some point this was seriously considered for C++ and I'm sure some language implementations actually do it.
Or you go with cactus stacks which have their own set of issues.
> you need to compile each function twice ... This can be wasteful.
More wasteful than writing each function twice? See C# for endless examples of libraries that have manual duplicate X() and XAsync() for every single method. If you compare the two they're almost always have the exact same body, except the async version has "async" and "await" peppered in and calls duplicate XAsync methods (which are implemented the same, except they call XAsync methods... you might be seeing a pattern here).
If I'm not mistaken most rust doesn't use separate compilation and is built from source. In this case the compiler can treat it just like normal generics and just not compile it until it's used. And even with separate compilation for, e.g. a library, a very basic LTO would trivially trim out the unused methods when compiled into a final binary. And if you don't want 2x library itself, even then you can just use an escape hatch to turn it of on the crate/impl/function level and it's still better for you than the status-quo where you have to manually add a bunch of tokens and junk that the compiler already knows how to do precisely.
If anything is wasted it's human effort because we're wasting our time doing something the computer can do better and faster.
> "annotations in the object file to describe the frame size"
Wouldn't you need this anyway for anything async, even if it's manual? This seems like an argument against async in general, not automatic async.
I quite dislike the current async trend, so you do not have to sell me the alternatives. I was just describing the status quo.
I think that a good compromise would be implicitly inferring the 'async-ness' of template functions instantiation (or whatever they are called in your language of choice) based on a magic continuation parameter, which would also go around the separate compilation issue (assuming the language does generic monomorphization).
Right, that's exactly how I would expect this to be implemented. The lowest-level IO functions (at the level of making syscalls) mark themselves as supporting async: e.g. "asyncable" which means it can be called either sync or async (which may actually be implemented with different os apis). Every function that calls one of these functions sees the results directly as if it were syncronous but is also marked as "asyncable". This "mark" continues up the call chain, completely transparently to the users (unless they use an escape hatch). Near the very top of the program, the main function, an main event loop, your http server thread, etc, the code chooses to call the first function either synchronously or asynchronously and decide details like which executor to use. If your "main" function calls it async, the entire call stack is compiled to use the async versions, all the way down to the syscall functions
The point is that only at the very top of the call stack or the very bottom should you ever need to care or think about whether code is async, because 99% of the time (every bit of code between main and syscalls... which is almost all of it) that's the only place you should need to care. This can be completely automatic because the compiler already knows where to insert await/async, it only needs to know when to use async (decided by main) and what to call at the end (decided by the syscall funcs).
Technically you do not, as you pointed out you can use the cps version for everything, but it is suboptimal, especially for interoperability with other languages and the os.
You’d still need to specify and/or have a default executer, neither of which are trivial tasks to design. Somewhat ironically, the go runtime is the closest to this and has the most explicit invocation of async behavior of the languages that have seriously tried to tackle this problem.
One of the biggest advantages of await is that it's explicit. Otherwise literally any function call can block, and whether or not it blocks is dependent on the body of the function so you can end up with races appearing in random parts of your code because you update a package. It's bad. At that point you might as well just use threads and blocking calls.
Data races like that can only happen when you share mutable state across logical flows of control. Don't share mutable state, regardless of your concurrency framework. I don't see how explicit vs implicit yield points change this fact. Either way there's precisely one logical flow of control that should own your data. In a language like Rust this is strongly enforced, regardless.
Futhermore, in any complex asynchronous I/O app most functions will end up being tagged async. The only ones that won't are simple leaf functions that wouldn't implicitly yield, anyhow. If someone has the bad idea to, e.g., put a yield point in non-obvious leaf functions as part of some kind of hack (e.g. logging, tracing), they're gonna do it in the async/await case, too, because they're already convinced it has value. Having to a drop a few more annotations here or there won't stop them from breaking the app.
Lastly, the bugs that do occur in these sorts of cases usually have to do with unpredictable latencies violating implicit or accidental ordering assumptions. async/await doesn't mitigate that at all because latencies are just as unpredictable. The solution, as always, is to avoid these ordering dependencies by not sharing mutable state.
Rust worked this way earlier in its development. The keyword as others mention is “green threads”. It moved away from this because this model has some overheads, and not every application is a socket server. Languages with this feature include Go, Haskell and Erlang.
Having made "async" the default, you've just returned to regular threading, which is what we should have stuck with all along. The thread model is actually pretty useful, the pitfalls are well understood, and the tooling very mature.
Threading sucks. It's a situation where you have some external process (the operating system) deciding when different pieces of work should be woken up, with no way of feeding this back (so it just bases it on relatively naive schedulers).
In practice, most of your threads are in one form of wait loop or another and you've just got polling both inside the threads and with the scheduler.
Have a look at Erlang if you want a better model :) the erlang "processes" (different to os processes) can intelligently only wake up when there is work for them to do.
For a language to efficiently use cores, it really needs to include it's own scheduling.
To pick some nits and generally elaborate, Erlang's VM also has a scheduler; it's not the presence or lack of a scheduler, it's how efficient it is and what guarantees it allows the programmer to make about their system. For example, OS schedulers are typically pre-emptive, which means your OS thread can get interrupted anywhere. On the otherhand, Go's scheduler (I'm using Go because I'm more familiar with it than with Erlang) only allows context switching at well-defined points in your program. Further, operating system threads have more overhead than in Go (presumably also Erlang) because they're fixed stack size (yes, I know this isn't true for all OSes).
>only allows context switching at well-defined points in your program.
Erlang works the same way. The VM scheduler will only context switch on a function call. for or while loops don't exist in Erlang which means there is no risk of blocking the scheduler.
You can have userspace threads (or even hybrids m:n threading). They went out of fashion in the last decade for many reasons but were fairly common in the past.
The idea is that cooperatve userspace task switching can be faster than kernel space task switching (a handful if cycles vs thousands). So moving the scheduler in userspace seems a natural evolution. But now you lose the ability to run on multiple cpus as from the kernel point of view the application is a single thread. The next step is to run n userspace scheduers, for each hardware CPUs, each scheduler running a number of userspace threads (thus m:n). Effectively this creates a two level scheduler (one in the kernel one in userspace) and some OSs have custon APIs (look for scheduler activation) that allow the two schedulers to cooperate allowing full preemption of user threads in all circumstances.
The reason that m:n threading went out of style is that, for CPU bound tasks (where you want to run exactly as many threads as there are CPUs), it is just useless overhead, while for the hundreds of thiusands of IO bound threads scenarios, the cost of stack switching is dominated by cache misses anyway, and the cost of calling into the kernel is amortized by the fact that IO requires a call inti elevated privileges anyway. At the sametime kernel threads scheduling has become very fast and userspace threads, which requirea whole stack of their own are not significantly more lightweight than kernel threads.
The modern async model is a compromise. On one side, the 'threads' consist of a single stack frame are very light weight, on the other side there is no generic userspace scheduler, but scheduling is fully controlled by the application.
You're right, in theory, if your select() or epoll() doesn't have a timeout. And the threads aren't doing any active work. But if you're optimising for that state, then you don't care about inefficiencies of one model or another.
There are also costs in setting and checking those locks etc. Sure, you can build solutions to optimise a broken model, which we've done over decades (with quite a bit of success!), but it doesn't make the model less broken.
I feel like you're making a very interesting point but it seems a bit broad and abstract for me to justify in my head. Could you clarify what you mean?
If you want easier concurrency with higher overhead, that's exactly what OS threads do. The OS kernel is the "event loop" in this case, and they're highly optimized for this, though there is some overhead which is hard to avoid.
Rust has gone back and forth on this. Pre-1.0 versions of Rust had a "green threads" mode. It was removed in favor of OS threads because it wasn't worth the complexity.
Go is a language in which "all functions are async" and it's pretty cool. But it needs a runtime, and interop with C / C++ has some real overhead related to switching to a C-style stack. Rust did not want to make those compromises, it wanted to be appropriate for bare-metal or kernel programming, and for libraries used by big high-performance C++ applications. In these cases, Go's model does not work well.
I personally use Go a lot, it's great for network services and command-line tools, and that's mostly what I do.
This is a very even-handed and fair description of the tradeoffs involved, something that is all-too-often missing in this space. Thanks for describing it so well.
I don't think there's anything about Go's concurrency model that precludes bare metal; I think it's just that the maintainers didn't want to support a bare metal runtime. In fact, I recall at least one other project that modified Go's runtime to support exactly this. I'm also not sure how much of the C interop overhead is due to the different stack models and how much is due to GC concerns or other things. But generally I agree with you--in practice Rust is well suited for these types of tasks and Go is not.
This is the situation in Scheme, Erlang and Go (although the latter two strongly bind continuations with a specific, built-in scheduling), and what we're doing in Java as part of Project Loom. All Java functions will be able to run inside delimited continuations without modification. Continuations/coroutines are thus a purely dynamic rather than a syntactic entity.
I think that being async-by-default makes sense for any new high-level language written these days, but the machinery required to make that happen is probably always going to be magic enough that there will be some space for low-level languages that will find value in giving the user choice between synchronous and asynchronous operations.
Alright, so I do computational plasma physics simulations for a living. Given our problem sizes often reach sizes which require teraflop level of parallelism to get done before I retire (that's my scale, I'm sure you're aware of petaflop sims the fluid/engineering guys do) we do need concurrency but it's done by chopping up the domain, literally the volume of space we simulate, onto processors. Now, we could possibly do async at some level (sub node level), but we absolutely need to be syncronized globally because we are simulating real-life physics and we need to maintain causality, especially when we pass information between nodes. Say a wave front passes across one node boundary into an adjacent one, waves must move at a finite and causal manner, of course, just as if you watched a ripple on the surface of a puddle.
If we did the whole problem async (async by default as I interpret it) it would require so much bookkeeping to keep causality that would make it unreasonably difficult. For things that require causality like that, we are lucky that the default mode of computation is synchronous because it fits my problem domain perfectly.
That's why I say "not everything is web" because while async maps well to problems like a server-client type application, it doesn't map well to all problems like my problem for example. Also, as someone else said, sync is easier to reason about for problems that don't require a lot of parallelism.
At worst, you'll have to explicitly wait for all your async calls to have completed and produce results. I suppose normally the implicit wait occurs on attempt to access the result. I also suppose you have to do it anyway in a multi-processor system, and you can't be doing it on a single core.
At best, some of your CPUs would be able to run calculations for two especially fast-to-compute pieces of volume space, while some other CPUs would be busy computing a particularly gnarly block of the volume space.
The problem is that async adds a lot of overhead, and while the mythical sufficiently good compiler could remove it, in practice it is a lot of work for little benefit.
For those CPU bound jobs that do not fit in the classical openmp style scheduling, more dynamic async style scheduling might be appropriate (cilk style work stealing for example) but the async granularity is hardly ever the function boundary.
They are a real problem in desktop apps when you have to shut down the app gracefully. If you call an async function during shutdown you can't really tell when the async chain is done and you have nothing to wait for.
Just dealt with that in a .NET app that uses a 3rd party SDK that heavily uses async/wait. I had to build a whole layer of state management to just handle shutdowns. Simple threading would have been much easier.
It's different with server side code. There async/await is very nice.
That just sounds like the API wasn't designed to support async/await rather than an inherent limitation. Something like Promise.all [1] is all you need to wait for multiple promises if the API had a way to accept a promise instead of a synchronous callback.
The same problem would happen with a promise. In a desktop the user clicks "Quit". If you now have to call an async function the Quit handler will return and somewhen later the await portion of the code gets called. From the Quit handler you can't tell when the await stuff has finished so there is no clear time when you can shut down the app.
They are, but calling a function asynchronously takes a lot longer than synchronously. For small functions the execution time could easily be increased by more than 100%. It's something that only really makes sense if the function will do (slow) I/O or a lot of computation, and the program has other things it could do in the meantime.
Making every function call asynchronous is likely to make most programs a lot slower.
Control is another reason. You relinquish control the more moving parts there are. If you ever looked into the tokio stack you realise there are a Lot Of Stuff going on.
Which doesn't mean it's bad. I'm excited about tokio and rust's async story. But I love that I get to choose.
This seems like the next logical step. A language where all functions are async/await.
And it’ll likely pull in some nice features from academia like, building a dependency graph of async statements so it can automatically reorder async statements to get optimal concurrency.
There's a huge advantage to not everything being async/await. Knowing exactly when you yield control can give you atomicity and more determinism for free.
According to a famous quote, the source of which escapes me, "await does not wait for anything, and async is not asynchronous." (This could be specific to C# though.)
If anywhere down the callstack a function needs to await something it has to become an async function. And this needs to be done to the whole callstack recursively. So over time more and more functions of every codebase turn into async functions.