Here’s a code challenge that, if resolved, will help me test a library I am writing: how can one write an infinitely deep data structure in Clojure? Infinitely wide data structures are common. And it’s even not too difficult to infinitely generate a recursive data structure (but it never returns…). My goal is to find the analogue of the lazy seq but in depth. An example would be “a list of a list of a list of a list of ….“. A characteristic of this data structure is that I could iterate first
on it forever.
Won't something like
(defn lazy-tree
[]
(lazy-seq
(list
(lazy-tree)
(lazy-tree))))
satisfy this condition?
It stack overflows, thoughI think any such data structure will stack overflow.
(defn x [] [(x)]) (x)
Not if it's lazy, then maybe you can walk it
seems like my lazy-tree doesn't stack overflow as long as you don't print anything
(def f (first (lazy-tree)))
^ this is fineit's fine as long you don't do anything with f
which is pretty useless ;)
You guys are fast. This was my first attempt, but it burns up tthe CPU…: ((fn f [] (lazy-seq (f))))
But attempting to get even the first element locks up. I’m not sure why since the very first element is not aggressively computed (you can perform class
on the return of that fn call above).
Let's ask the big question - why?
I”m OK with blowing out the stack on printing or attempting to walk infinitely deep -that’s kinda the point of this structure. I want to be satisfied that my code lazily walks an infinitely deep data structure.
Just as first
can force realization of the first of an infinitely wide data structure, I want to prove that my accessor ( a variant of a zipper) forces instantiation of only a finite number of recursions of an infinitely nested data structure.
@ben.sless ' attempt and mine have a common shape: a recursive call to a lazy seq with a single elements that is a call to a lazy…
Actually, my example is both infinitely deep and infinitely wide
Yep. Mine is not, and just infinitely deep is OK.
its width increases exponentially
I’ve already got the width test working.
but what's the difference between an infinite lazy seq and an infinitely deep seq? an infinite lazy sequence is like a linked list. Isn't a linked list deep? or do you consider it wide?
But the infinintely deep part seems like it should work. But it clearly does not
I think the difference can best be explained with an example accessor: (iterate first data)
should run infinitely long on an infinitely deep structure. It would not (necessarily) on an infinitely wide lazy seq.
Upon further reflection, here is my litmus test: (iterate first data)
should neither error nor return only on an infinitely deep data structure while (iterate rest data)
should neither errors nor return only on an infinitely wide data structure. (both modulo stack overflow or OOM conditions).
still haven't answered why
at first reading this is classic https://xyproblem.info
WHy is easy: to test that a zipper-like constructor does not accidentally walk infinitely deep (and I know it does now, it’s not trivial to fix and it’s easy to regress -thus a unit test seems prudent).
…plus it seems like, for all it’s elegant handling of laziness, Clojure would have a tool for lazy recursive (in depth) data structures.
several of the code snippets above will generate recursive datastructures
But none seem to be able to do so in finite time.
whether or not you can print them is a separate concern
I do not understand that assertion
Maybe I missed something…. checking my first solutiuon again…
Nope. Mine (for sure) and I think Ben’s solution (I think, more testing required) both cannot pass the test of (class (first data))
(class data)
=> LazySeq
(class (first data))
=> ….
Looks like @ben.sless ' solution (`lazy-tree`) passes the “Access the first element lazily” test…
Here’s a slightly simplified version (only infinite vertically) of Ben’s lazy-tree
that does indeed solve the problem:
(def lazy-tree
((fn f [] (lazy-seq
(list
(f))))))
Thanks, Ben.
Correctly blows out the stack on (iterate first (lazy-tree))
.
Can we auth with maven directly in deps.edn? https://clojure.org/reference/deps_and_cli#_maven_authenticated_repos
Basically I want the setting go with the project, instead of residing in $HOME
I have the following function which recursively reads all forms in a file:
(import '[<http://java.io|java.io> PushbackReader])
(require '[<http://clojure.java.io|clojure.java.io> :as io])
(defn read-all
[file]
(let [rdr (-> file io/file io/reader PushbackReader.)]
(loop [forms []]
(let [form (try (read rdr) (catch Exception e nil))]
(if form
(recur (conj forms form))
forms)))))
It works well, however, I want to read maps within the forms as sorted-map
. One inelegant way I can do that is to switch all instances of {:bla "bla"}
in file
to (sorted-map :bla "bla")
before sending it to read-all
.
(spit "test.json" (into (sorted-map) (sort (zipmap (range 50 0 -1) (repeat 1)))))
(read-all "test.json") ;; [{7 1, 20 1, 27 1, ...}] ;; Darn!
Can you think of a better way to get the items in hash-maps in the order they are given? Is it possible with clojure.edn/read
and using the :readers
opt?maps have no order
if you want to post-process them into a sorted-map, that is possible
if you know that the keys are sortable ahead of time
I'd love to be able to read {4 1, 2 3, 6 7}
from a file into an array-map
so the order given is preserved.
@endrebak85 The array map threshold is an implementation detail. Just store a seq of tuples.
Or use e.g. clj-commons/ordered
https://clojurians.slack.com/archives/C03S1KBA2/p1615222143272600
what are you really trying to do by preserving written order?
I am creating a little DSL where {}
preserves the order written. So when I read a snippet of a file written in my DSL stuff enclosed in braces becomes a seq of tuples (as per @borkdudes suggestion).
(def example-dsl-snippet "{1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1}")
(read-string example-dsl-snippet) ;; this loses order: {7 1, 1 1, 4 1, 6 1, 3 1, 12 1, 2 1, 11 1, 9 1, 5 1, 10 1, 8 1}
I'd love a function that could read a string describing a map so it becomes (seq '([1 1] [2 1] [3 1] ...))
, not a map.@endrebak85 Is that DSL used from Clojure programs? Then that may be very confusing
yeah, you just can't assume order for maps like that, so you can't use maps for a dsl like that
you could also store the order within the map {:a 1 :b 2 :order [:a :b]}
Yes, but the DSL is basically json
. It is not meant for Clojure programmers, but people with experience with a Python DSL called https://snakemake.readthedocs.io/.
For those users, this would be expected behavior.
But you want to parse this using a Clojure parser?
But this would require my users to add the order above. I guess preprocessing the file to switch {-1 5 0 3}
into (seq '([-1 5] [0 3]))
is the best way to do it. Or just use a vec of [kw val]
tuples.
Yes. I guess I should just use a vector of tuples: [[kw val] [kw2 val2] ...]
instead
If this language is only used from say Python, but you want to parse / use it from Clojure (for whatever reason), I would just roll my own parser for this and not try to parse this as EDN
Thanks for the suggestions and help btw.
This is what I thought too, but an answer with 11 upvotes on Stack Overflow suggested using array-map
so I thought perhaps I was mistaken.
Afaik array-map is just an optimization detail
why do we have [vswap! swap! alter send] for [volatile atom ref agent]?
is there a reason for nonexistence of a polymorphic name for different ref-like-values?
or perhaps i am not aware of that protocol?
There is no protocol for atoms in Clojure, only interfaces (IAtom and IAtom2)
Yeah, 8 seems to be the magic cutoff on my machine:
user=> (into (array-map) (into (sorted-map) (zipmap (range 9 0 -1) (repeat 1))))
{7 1, 1 1, 4 1, 6 1, 3 1, 2 1, 9 1, 5 1, 8 1}
user=> (into (array-map) (into (sorted-map) (zipmap (range 8 0 -1) (repeat 1))))
{1 1, 2 1, 3 1, 4 1, 5 1, 6 1, 7 1, 8 1}
Q about maintaining backward compatibility: if I have some protocol that declares three functions and may have additional user-supplied implementations out in the wild, and I add a fourth function to that protocol (and implement it in my code), could that break users' existing code?
My immediate reaction is that "changing a protocol" is breaking, since the protocol is part of the public API -- but if folks only provide a partial implementation and never call the unimplemented functions anyway, it wouldn't affect them?
"changing a protocol" is breaking -- this isn't quite precise does adding the new method break existing users? -> no, so it's not breaking
unless there is something about the way the unimplemented method is used -- it's tricky stuff
(I'm agonizing over whether to extend next.jdbc/execute-batch!
the same way execute!
et al operate so you can pass a connectable or sourceable instead of just a PreparedStatement
)
ostensibly something is calling the unimplemented method
because they have different semantics? ... refs need to be used in a transaction. Agents are asynchronous. Volatile's and Atoms have different thread guarantees. I'm not sure that anyone should be pretending that they're the same to write to ... only that you can deref
them to read
it's not about pretending they are the same
it's about having an abstract operator with a unique name that has different semantics based on the value it is called with
Abstractions are about pretending a group of things are the same in a specific way. Clojure's reference types are the same in that they can always be read without a significant cost (unlike a remote actor which would require a network request, for example) because they're in ram and (ideally) wrap a persistent data structure. But they all have different semantics for writing to, so what's the point of an abstract function? Isn't that just a common name that obscures the important difference between why you're using one reference type over another?
I'm not sure I understand why that would be a positive thing
obscures it for whom? why would an expression have to locally demonstrate the semantics of it's execution if the semantics doesn't matter locally?
at the local point where swap!/vswap!/... is called, the knowledge about the type of the ref does not matter
after all, is the ref itself not telling you that difference?
(circle-area circle) (square-area square)
Might also depend on how much weight one puts behind Hyrum's Law
is there an actual api doc anywhere for next.jdbc? not seeing it on the readme
ah, nvm I found it
so you're talking about adding -execute-batch to Executable and calling it from execute-batch! ?
are there known implementors of Executable outside next.jdbc?
Let's say there was a a protocol for writes of ref-like things.
Then the underlying interface would be "megamorphic" which makes function inlining less likely
Not sure if this is the precise reason though, but megamorphic site:<http://clojure.org|clojure.org>
returns some hits
> why would an expression have to locally demonstrate the semantics of it's execution if the semantics doesn't matter locally? Because it matters in the execution of the program. If it's a agent, then I can't deref it and expect it to have the value I've just sent it, or if it's a ref I have to have a transaction open make a change. > at the local point where swap!/vswap!/... is called, the knowledge about the type of the ref does not matter I'm not sure I can agree with that. I think at that point it very much matters. My send to an agent has completed, what can I do with that agent, vs the equivalent on an atom. > after all, is the ref itself not telling you that difference? Yes, the constructors are different. But so are the update semantics, and also the reasons that an update might fail or be retried. These differences are what would be obscured by an abstraction that allowed you to easily exchange one for another. It would also need to take into account things like transactions, asynchrony, livelocks and deadlocks (for example), and complicate the ease of use of things like atoms. Also, how would it deal with alter vs commute? Would it need to support both update strategies on all reference types?
a quick skim through http://grep.app leads me to believe the answer is no
I think history is also a reason that this doesn't exist. The STM predates protocols in Clojure.
Yeah, that's the specific case here but I was also sort of asking in the more general case about whether expanding the list of functions in a protocol is accretive or breaking, and under which circumstances.
(and thanks for checking http://grep.app for implementations -- I hadn't gotten far enough down the path to do that work)
I would have to think harder about whether this is a breaking change in the face of compilation and all the scenarios there but from a source pov, I think you are only going to call the new method if someone passes you a connectable/sourceable to execute-batch! (and those weren't possible before other than PreparedStatement, which presumably you will fix up)
Yes, the intention is to try to do this in a purely accretive way -- so new uses become possible, without breaking any existing uses.
I guess I should add a more obvious link to the API docs then...
in general, adding a method to an interface in Java is NOT considered a change that breaks binary compatibility
your take would have made sense to me if one of swap!/vswap!/alter!/send had implementations for more than one ref-type
(mentioning b/c protocols have a backing Java interface ofc)
https://docs.oracle.com/javase/specs/jls/se11/html/jls-13.html#jls-13.5.3
given that protocols already handle the case of missing protocol methods, I think from a Clojure POV, it would just be seen in the same way
I'm not sure I understand why that makes more sense ... can you explain further, please?
(done; also link to older docs under old group ID)
so I want to say - this is not a breaking change in that sense
thx
Thanks, Alex. That's good feedback and justification, and makes me feel more comfortable about it.
Alternatively, you can explicitly use array maps. The order will be preserved. Do note however, that assoc
, dissoc
, and any other function that returns a new map may change its type to a hash map.
it's hard for me to see any case in what you describe where something that used to work now does not work. merely that new additional things may need to be added to make external extensions work with the new api
and I don't even see any cases of that in the wild
all these functions are always called in the presence of the ref-like value, so there is no confusion there with respect to alter/commute issue, in the absence of knowledge, it would call whichever operation that is more generic (expects less structure from data)
Hello folks, just a quick question. I have a ring handler that does some long I/O-operation. I want it to start the operation, and then to return immediately with a URL, which a client can poll for to see progress.
The progress is stored in an atom.
Now I simply wrap the long operation in a future
, but is that idiomatic? If I never deref the future, would that be a problem, ie. would that mean that eventhough the future goes out of scope, it would still linger/run somewhere?
yes
a future is another thread, it has a life of its own once started regardless of being deref'ed or not
ok, so if that doseq that’s in there finishes, the thread will be destroyed?
sure
ok thanks
(a future actually runs in a threadpool, so in theory instead of being destroyed the thread may run another future)
what if like 8 requests or so come in at roughly the same time
multithreading man
its a thing
I know haha 🙂 ok
i was just wondering if it could fill up the threadpool
in general it is not good practice to just fire off futures willy nilly, because usually you will end up wanting more control over their execution, but if you don't want more control then there is nothing wrong with it
thanks ok
I’ll use the well-let’s-refactor-if-it-goes-boom approach then
(ie. my definition of pragmatic ;-))
@kah0ona alternatively you could put the work on a queue and have a worker that constantly consumes work from that queue
but I've used the "`future` and forget" approach myself a lot as well ;)
yeah, that’ll be the approach i’ll take when everything has screeched to a halt, if that ever happens 🙂 but it’s a quite infrequently used endpoint, so i’m winging it
> all these functions are always called in the presence of the ref-like value, so there is no confusion there I think that's what I'm getting at. There is confusion. Everything about calling one of these update functions is different, apart from the fact that they all do some form of state management > with respect to alter/commute issue, in the absence of knowledge, it would call whichever operation that is more generic (expects less structure from data) alter updates a ref in a compare and swap kind of way (but consistently within the transaction), wheras commute allows for commutable operations to be reordered if there's a transactional conflict. This isn't supported by any the other reference types and is an example of where the semantics up updating a ref differ from an atom, for example.
it can definitely fill up the threadpool - that is a thing
but the abstraction is still what you want
there's also the https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executor.html interface from the JVM with a few options that provide more control
you cannot "fill up" the future threadpool, it will grow without limit
(and shrink automatically)