I read this article about using core async and transducers to read and process a csv file https://www.javacodegeeks.com/2017/12/gettin-schwifty-clojures-core-async.html The author closes on a slightly disappointed note regarding the performance. I wonder if there is something that could be done to optimize their code. Also; I read elsewhere that it's better to use blocking put for IO.
on a quick skim it looks like they are doing IO inside a go block, which is a very bad idea
go isn't a mechanism for faster throughput, a dedicated thread pool is much better at that, it's a mechanism for async coordination, which this task hardly needs
everything about this is imo a weird approach to force something into using every core.async construct
it is probably much simpler and faster to just write a tight sequential loop
if you truly want to parallelize it, you probably want to memory map it or randomaccessfile, break it into n chunks, then do that same tight loop. the first part of that is somewhat complicated interop (and needs to take into account finding "line" breaks
ghadi spoke a bit about something like this in slack a while ago. using a custom pipeline iteration and a file walker pump to saturate cores. i made a gist out of it but would love to see a proper blog post about it
or you could juts write like 2 lines of awk
well the ghadi stuff above is eventually probably coming to clojure and core.async and there will be some bloggy things when we get to that
cool. really enjoy everything he shares
(that's not the iteration stuff)
or maybe I'm conflating
yeah, sorry nvm!
the gist has comments below that explain wiring it together that are super helpful in getting an idiomatic core async pipeline up and running doing tons of work safely
the stuff excerpted above takes a filesystem walker, and pipelines over the stream of files, shelling out to a process for each file
producer <> consumer , where the consumer is pipelined
with short core operations, it's important to batch
I think the article linked above probably misses that
seems like it unconditionally fans out even with short ops
as alex says, it's a bit kitchen-sinky
As an aside, I can delete or make that gist private if you donβt like me copying and preserving you like that
no it's fine π
if I put it out there, I put it out there π
@alexmiller may be worth considering making CompletableFuture interop with channels better
@hiredman has a gist about it, and L8-23 above are a manual adaptation of CF -> channel
the gist is likely incorrect, because the handler lock is mostly a noop; outside of alts handlers depend on the channel lock
not sure why it is usually a noop, performance? but it makes things like extending the protocols to existing types annoying