where are the best posts/articles/books/chapters to help understand how to use core.async to build data processing pipelines? I've found this one: https://adambard.com/blog/stream-processing-core-async/ but would like to find some more (and/or hear opinions about this one)
Clojure Applied has a chapter on that
@alexmiller cool. I'll review that.
If I understand correctly, using IO
in a go block
is bad because it would block the thread that the go block
is using. Meaning that if you would want to read from a database (for example) you could use core.async/thread
to spawn a new thread. Then park the go block
while listening to the spawned thread using <!
. Is this correct?
yes
async/thread just gets the work off the go block threadpool, so you can use something else to do that as well
Right
Does thread
have an unbound thread pool?
yes
If it does, then you’d have to manage it yourself, right?
If you just spawn thread unmanaged, you could use up all your resources I would assume?
it depends, but yes it is often useful to limit that in someway, which is one reason you might using something other than core.async/thread (at work we have something similar that uses a threadpool we control)
but we have a few places where a singleton go loop sometimes calls async/thread and waits on the result, so by construction it can't create many threads
there's nothing special about the threads created by thread
Ok, but in the case of where for example traffic dictates the amount of spawned threads, you’d definitely need a pool
use any thread pool you like, just communicate via channels
All right, sounds good, then I have one more question..
thread
is just a helper and does the extra return channel thing - there's almost nothing there
it's mostly the binding conveyance by loc
(I might be misunderstanding some concepts) Let’s say you limit your thread pool to X amount of threads. If all X threads are busy you will have to wait for a thread to free up. How is this different from bumping up the go loop thread pool to X amount of go loop threads?
if you did that, you'd easily have code that succeeds in local / staging / tests and fails under real load
for one thing
Why is that?
I assume because you can’t replicate the load, but how is that different from testing a regular thread pool?
because you can starve the thread pool for go blocks, if it happens faster / at lower usage, you can catch it easier
go blocks can do coordination faster and cheaper than a thread pool if used correctly - because they context switch without system calls
(or at least can)
the main thing is the go block threadpool is a shared resource
other libraries, other parts of your code, etc may want to use it, so if you are are blocking it that is a problem
Ah I see
That makes sense
But doesn’t that mean that if no libraries are using core.async, and you manage the go loop pool, then it would work the same as managing your own thread pool?
Hypothetically
sort of, it is a complicated threadpool where tasks run for a bit, then get put on the back of the queue, which is more complicated to manage then a threadpool that pulls a task and runs it to completion
assuming you are actually running go blocks doing channel stuff
which if you aren't, there is no reason to use the go block pool at all
Go blocks doing channel stuff?
reading and writing to channels
I dont' have proof but my personal theory is that the go block thread pool was intentionally shrunk as an anti-foot-gun measure, to lead core.async users toward the kind of constructions that actually benefit in any way from core.async
when a go block reads from a channel, the continuation of the block is added as a callback to the channel, and the go block stops running so some other go block can run on that thread, and once something is written to that channel the callback is put back on the queue
Right
But if you didn’t do that then there wouldn’t really be a point to using core async, right? It’s all about channel communication
yep
but I dunno, you seem to be asking wild blue sky questions
haha sorry
I’m just trying to understand
But thanks everyone for answering, it’s all a lot clearer now
(The blue sky)
Erlang really had a big impact on the way I look at concurrency. So trying to really get into the Clojure way