assume I want to use the output from a channel in a context which requires a collection, would it be ok/bad idea to do something like this:
(defn seq!!
"Returns a (blocking!) lazy sequence read from a channel."
[c]
(lazy-seq
(when-some [v (a/<!! c)]
(cons v (seq!! c)))))
(for [x (seq!! c)] (dostuff x))
?laziness over IO is usually the wrong thing, but if the usage of c
is implemented consistently with lazy behaviors in mind I guess?
what's the context that needs a collection here?
is there some other way to use the values from a channel in a context which requires a coll / seq?
a complex for comprehension digging into the innards of a deeply nested data structure
can the process consuming the channel also consume the result of that for?
yes
move it into a transducer?
my rule of thumb is to do side effects as close to the top level as possible (and channel ops are side effects), the lower down the stack the IO goes, the more code needs to account for the ways it can go wrong
so instead of abstracting the IO (as seq or transducer), I'd prefer to have a top level function explicitly doing IO, and then using functional abstractions in the process of doing so
guess I'm struggling to understand how to apply that statement to my problem (and totally does not mean I don't agree, just working on assimilating) Context: I'm writing a function which essentially does crawler-like http requests. It first makes X number of requests of type A where each response (after digging into the data structure) can lead to requests of type B. I currently have two for comprehensions, one for response of type A, one for responses of type B where the number of requests of type B depend on the data returned in the A responses. All requests should be async. Maybe there is a better way to model this. I figured I could mine an A response, create one or more B requests for each A request and have them report to a channel when they complete. That way the second for comprehension can consume them as they become available...but maybe there is a better way to think about this.
this is partly an actual problem, partly me trying to learn the right way to do things in clojure
you can do all of this directly with a "pending requests" channel (read from it to find endpoints to ingest, write to it when you find more endpoints to ingest) and N workers
I don't see what the lazy-seq / list comprehension abstraction does that would help you here
ignoring the async aspects for a second, a for comprehension seemed to me to fit the conditional extraction of data from the deeply nested responses well so I was hoping to keep that
still mulling this over...
for as a complicated macro which I am pretty sure generates anonymous functions isn't going to play well with the go macro
crawlers in general can be problematic with core.async because they create a feedback loop if you are not careful
the returned structure is a deep tree of collections and maps, if I drop the first level on the for comprehension (which currently deals with a coll of requests), then the fors only care about the result of one request and could be consumers of a "result channel" and thus outside of the go/async only leaving the queuing of endpoints and making of requests for core.async
how come I always feel like a total tool when I try to use core.async : )
if you never use go blocks, you can do whatever you want
the rules about blocking and io only matter for the shared resource of the threadpool that go blocks run on, if you are using real threads that belong to you, go nuts, block your seqs on channels
currently futures and no go blocks
I will go meditate on this in a corner for a while. Thanks a ton for all the input!