core-async

niveauverleih 2020-07-09T16:59:07.023300Z

I read this article about using core async and transducers to read and process a csv file https://www.javacodegeeks.com/2017/12/gettin-schwifty-clojures-core-async.html The author closes on a slightly disappointed note regarding the performance. I wonder if there is something that could be done to optimize their code. Also; I read elsewhere that it's better to use blocking put for IO.

2020-07-09T17:04:17.023900Z

on a quick skim it looks like they are doing IO inside a go block, which is a very bad idea

2020-07-09T17:04:55.025Z

go isn't a mechanism for faster throughput, a dedicated thread pool is much better at that, it's a mechanism for async coordination, which this task hardly needs

alexmiller 2020-07-09T17:05:01.025200Z

everything about this is imo a weird approach to force something into using every core.async construct

βž• 3
alexmiller 2020-07-09T17:05:59.026200Z

it is probably much simpler and faster to just write a tight sequential loop

alexmiller 2020-07-09T17:07:15.028200Z

if you truly want to parallelize it, you probably want to memory map it or randomaccessfile, break it into n chunks, then do that same tight loop. the first part of that is somewhat complicated interop (and needs to take into account finding "line" breaks

dpsutton 2020-07-09T17:07:42.029100Z

ghadi spoke a bit about something like this in slack a while ago. using a custom pipeline iteration and a file walker pump to saturate cores. i made a gist out of it but would love to see a proper blog post about it

πŸ’― 1
dpsutton 2020-07-09T17:07:51.029500Z

alexmiller 2020-07-09T17:07:56.030Z

or you could juts write like 2 lines of awk

βž• 2
alexmiller 2020-07-09T17:08:47.030900Z

well the ghadi stuff above is eventually probably coming to clojure and core.async and there will be some bloggy things when we get to that

dpsutton 2020-07-09T17:09:10.031600Z

cool. really enjoy everything he shares

ghadi 2020-07-09T17:09:10.031700Z

(that's not the iteration stuff)

alexmiller 2020-07-09T17:09:13.031800Z

or maybe I'm conflating

alexmiller 2020-07-09T17:09:18.032100Z

yeah, sorry nvm!

dpsutton 2020-07-09T17:10:17.033700Z

the gist has comments below that explain wiring it together that are super helpful in getting an idiomatic core async pipeline up and running doing tons of work safely

ghadi 2020-07-09T17:10:19.033800Z

the stuff excerpted above takes a filesystem walker, and pipelines over the stream of files, shelling out to a process for each file

ghadi 2020-07-09T17:10:49.034300Z

producer <> consumer , where the consumer is pipelined

ghadi 2020-07-09T17:11:38.034700Z

with short core operations, it's important to batch

ghadi 2020-07-09T17:11:50.035100Z

I think the article linked above probably misses that

ghadi 2020-07-09T17:12:21.035600Z

seems like it unconditionally fans out even with short ops

ghadi 2020-07-09T17:13:04.036Z

as alex says, it's a bit kitchen-sinky

dpsutton 2020-07-09T17:14:19.036900Z

As an aside, I can delete or make that gist private if you don’t like me copying and preserving you like that

ghadi 2020-07-09T17:14:26.037100Z

no it's fine πŸ™‚

πŸ‘ 1
ghadi 2020-07-09T17:14:37.037500Z

if I put it out there, I put it out there πŸ™‚

ghadi 2020-07-09T17:15:02.038100Z

@alexmiller may be worth considering making CompletableFuture interop with channels better

ghadi 2020-07-09T17:16:00.039100Z

@hiredman has a gist about it, and L8-23 above are a manual adaptation of CF -> channel

ghadi 2020-07-09T17:16:51.039400Z

2020-07-09T17:19:41.041Z

the gist is likely incorrect, because the handler lock is mostly a noop; outside of alts handlers depend on the channel lock

2020-07-09T17:22:59.042300Z

not sure why it is usually a noop, performance? but it makes things like extending the protocols to existing types annoying