core-async

2019-12-17T09:47:19.115Z

where are the best posts/articles/books/chapters to help understand how to use core.async to build data processing pipelines? I've found this one: https://adambard.com/blog/stream-processing-core-async/ but would like to find some more (and/or hear opinions about this one)

alexmiller 2019-12-17T14:06:50.115400Z

Clojure Applied has a chapter on that

2019-12-17T17:09:55.115900Z

@alexmiller cool. I'll review that.

kwrooijen 2019-12-17T19:14:35.116200Z

If I understand correctly, using IO in a go block is bad because it would block the thread that the go block is using. Meaning that if you would want to read from a database (for example) you could use core.async/thread to spawn a new thread. Then park the go block while listening to the spawned thread using <!. Is this correct?

2019-12-17T19:16:20.116400Z

yes

2019-12-17T19:20:07.117900Z

async/thread just gets the work off the go block threadpool, so you can use something else to do that as well

kwrooijen 2019-12-17T19:20:25.118600Z

Right

kwrooijen 2019-12-17T19:20:43.119200Z

Does thread have an unbound thread pool?

2019-12-17T19:20:51.119600Z

yes

kwrooijen 2019-12-17T19:20:52.119700Z

If it does, then you’d have to manage it yourself, right?

kwrooijen 2019-12-17T19:21:57.121500Z

If you just spawn thread unmanaged, you could use up all your resources I would assume?

2019-12-17T19:21:58.121700Z

it depends, but yes it is often useful to limit that in someway, which is one reason you might using something other than core.async/thread (at work we have something similar that uses a threadpool we control)

2019-12-17T19:23:20.122800Z

but we have a few places where a singleton go loop sometimes calls async/thread and waits on the result, so by construction it can't create many threads

alexmiller 2019-12-17T19:24:14.123500Z

there's nothing special about the threads created by thread

kwrooijen 2019-12-17T19:24:21.123700Z

Ok, but in the case of where for example traffic dictates the amount of spawned threads, you’d definitely need a pool

alexmiller 2019-12-17T19:24:32.124100Z

use any thread pool you like, just communicate via channels

kwrooijen 2019-12-17T19:25:01.125Z

All right, sounds good, then I have one more question..

alexmiller 2019-12-17T19:25:13.125500Z

thread is just a helper and does the extra return channel thing - there's almost nothing there

2019-12-17T19:25:35.126200Z

it's mostly the binding conveyance by loc

kwrooijen 2019-12-17T19:27:23.128200Z

(I might be misunderstanding some concepts) Let’s say you limit your thread pool to X amount of threads. If all X threads are busy you will have to wait for a thread to free up. How is this different from bumping up the go loop thread pool to X amount of go loop threads?

2019-12-17T19:28:27.129Z

if you did that, you'd easily have code that succeeds in local / staging / tests and fails under real load

2019-12-17T19:28:30.129300Z

for one thing

kwrooijen 2019-12-17T19:28:46.129500Z

Why is that?

kwrooijen 2019-12-17T19:29:12.130400Z

I assume because you can’t replicate the load, but how is that different from testing a regular thread pool?

2019-12-17T19:29:13.130500Z

because you can starve the thread pool for go blocks, if it happens faster / at lower usage, you can catch it easier

2019-12-17T19:29:48.131200Z

go blocks can do coordination faster and cheaper than a thread pool if used correctly - because they context switch without system calls

2019-12-17T19:29:53.131500Z

(or at least can)

2019-12-17T19:30:02.131700Z

the main thing is the go block threadpool is a shared resource

2019-12-17T19:30:23.132200Z

other libraries, other parts of your code, etc may want to use it, so if you are are blocking it that is a problem

kwrooijen 2019-12-17T19:30:43.132400Z

Ah I see

kwrooijen 2019-12-17T19:30:48.132600Z

That makes sense

kwrooijen 2019-12-17T19:33:07.134600Z

But doesn’t that mean that if no libraries are using core.async, and you manage the go loop pool, then it would work the same as managing your own thread pool?

kwrooijen 2019-12-17T19:33:22.134800Z

Hypothetically

2019-12-17T19:34:43.136100Z

sort of, it is a complicated threadpool where tasks run for a bit, then get put on the back of the queue, which is more complicated to manage then a threadpool that pulls a task and runs it to completion

2019-12-17T19:35:58.137Z

assuming you are actually running go blocks doing channel stuff

2019-12-17T19:36:10.137300Z

which if you aren't, there is no reason to use the go block pool at all

kwrooijen 2019-12-17T19:37:21.137500Z

Go blocks doing channel stuff?

2019-12-17T19:37:39.138Z

reading and writing to channels

2019-12-17T19:38:47.139900Z

I dont' have proof but my personal theory is that the go block thread pool was intentionally shrunk as an anti-foot-gun measure, to lead core.async users toward the kind of constructions that actually benefit in any way from core.async

2019-12-17T19:39:16.140100Z

when a go block reads from a channel, the continuation of the block is added as a callback to the channel, and the go block stops running so some other go block can run on that thread, and once something is written to that channel the callback is put back on the queue

kwrooijen 2019-12-17T19:40:14.140300Z

Right

kwrooijen 2019-12-17T19:40:56.140800Z

But if you didn’t do that then there wouldn’t really be a point to using core async, right? It’s all about channel communication

2019-12-17T19:41:13.141Z

yep

2019-12-17T19:41:57.141700Z

but I dunno, you seem to be asking wild blue sky questions

kwrooijen 2019-12-17T19:42:07.142Z

haha sorry

kwrooijen 2019-12-17T19:42:12.142200Z

I’m just trying to understand

kwrooijen 2019-12-17T19:42:51.142900Z

But thanks everyone for answering, it’s all a lot clearer now

kwrooijen 2019-12-17T19:42:58.143100Z

(The blue sky)

kwrooijen 2019-12-17T19:44:58.144600Z

Erlang really had a big impact on the way I look at concurrency. So trying to really get into the Clojure way