beginners

Getting started with Clojure/ClojureScript? Welcome! Also try: https://ask.clojure.org. Check out resources at https://gist.github.com/yogthos/be323be0361c589570a6da4ccc85f58f.
2020-10-16T01:39:25.027500Z

where does clj/clojure download dependencies?

marshall 2020-10-16T01:42:58.027900Z

your maven .m2 directory

marshall 2020-10-16T01:43:18.028300Z

it also caches deps in your local project under .cpcache

marshall 2020-10-16T01:43:22.028500Z

@jeeq ^

2020-10-16T01:45:45.028800Z

Ah. Thank you @marshall

alexmiller 2020-10-16T01:47:04.029500Z

It caches the computed classpath in .cpcache (not deps)

marshall 2020-10-16T01:47:32.029700Z

oh right, sorry 🙂

ozzloy 2020-10-16T03:06:11.031200Z

http://http-kit.github.io/migration.html#reload why does it suggest running #'all-routes instead of just all-routes? if wrap-reload reloads the whole namespace on every change, then won'

ozzloy 2020-10-16T03:07:15.032100Z

t all-routes contain its new definition by the time -main is also reloaded?

ozzloy 2020-10-16T03:08:04.032900Z

is this just for repl purposes? so you can redefine all-routes at the repl and not necessarily reload the entire namespace?

ozzloy 2020-10-16T03:10:54.034700Z

also, why is there ring-reload but not (at least i couldn't find it) something like http-kit-reload? why is reloading the namespace be done at the ring level and not server level?

seancorfield 2020-10-16T03:25:24.037600Z

@ozzloy_clojurians_net My recommendation is: do not use any of these auto-reload things. Just learn how to write code that's amenable to the REPL -- which is why you'd write #'all-routes because that's a Var reference which introduces a layer of indirection so if you re-`defn` the function (via the REPL) the new definition will be picked up immediately.

William Skinner 2020-10-16T13:36:07.078100Z

This gave me an aha moment @seancorfield thank you

Matthew Pettis 2020-10-16T14:11:02.078300Z

I've recently been trying to figure out the difference of using a symbol vs a var reference in a repl and how it interacts with redefining a function -- are there any resources that dive into that a bit more? Probably help me understand the var indirection mechanism a bit more too then...

seancorfield 2020-10-16T16:41:04.079400Z

@matthew.pettis https://clojure.org/guides/repl/enhancing_your_repl_workflow#writing-repl-friendly-programs talks specifically about #'

👍 1
Matthew Pettis 2020-10-16T20:16:55.094300Z

Thank you! I read that link. A follow-up question... in the 4 code examples they have at that link/anchor, is #3 not REPL-friendly because the value of print-number-and-wait is inlined into the future call, and so cannot be redirected, while #2 is REPL-friendly because print-number-and-wait is not inlined into anonymous functions, and has to be looked up upon every invocation of the anonymous function? #4 seems to work on that same principle, in that the var has to be looked up and resolved upon every invocation, and so if you change what's in the var, it will use the new value you redef the var to after the change... does this sound correct?

Matthew Pettis 2020-10-16T20:19:59.094500Z

If this is so, I can see why this is REPL-friendly for development. I can also see that if you want to keep your functions as pure as possible, you will not use #' to call functions inside of other functions because that exposes your function to the possibility of becoming impure, as now your function depends on vars, which are mutable, rather than the function values they would get...

seancorfield 2020-10-16T20:49:06.094700Z

Re: #3 -- correct; #2 -- this works because the function appears in the "call" slot, not in a "value" slot, and calls are always dereferenced (so the current binding of the function is always used); #4 -- works because what is passed in is a Var: the "box" that contains the function's value, so when you invoke a Var it always resolves to the current binding.

seancorfield 2020-10-16T20:49:58.094900Z

Purity really isn't relevant here. Whether a function is pure or impure is about side-effects.

seancorfield 2020-10-16T20:51:07.095100Z

Passing #'some-func and (fn [x] (some-func x)) are pretty much equivalent from a REPL redefinition p.o.v. You could also say (var some-func) (since that's what #' is shorthand for).

seancorfield 2020-10-16T20:53:47.095300Z

About the only time you will be "mutating" a Var is via redefinition in the REPL -- so it's not like #' is going to make your code impure because someone is nefariously modifying top-level bindings behind the scenes. There are a handful of valid uses for alter-var-root, for example, but Clojure programmers don't treat Vars like other languages treat "variables".

seancorfield 2020-10-16T20:54:04.095500Z

Hope that helps clarify @matthew.pettis?

Matthew Pettis 2020-10-16T20:54:53.095700Z

The 'call-slot' and 'value-slot' distinctions are very helpful, thanks. I was thinking about purity not in the sense of side effects, but in the sense that if you call a function twice with the same arguments in a program, you should get the same result. This is not the case if, for #2 or #4, you change out the definition of print-number-and-wait between calls. So I am probably abusing the idea of "pure function", but those are the notions I have about them...

seancorfield 2020-10-16T20:56:11.095900Z

Given the prevalence of 'call-slot' usage, almost all code would be "impure" by your definition 🙂

seancorfield 2020-10-16T20:57:11.096100Z

(defn foo [x]
  (+ (bar x) 13))
that would be susceptible to people redefining the bar function inbetween calls to foo -- but that's not how we think about it.

seancorfield 2020-10-16T20:58:36.096300Z

user=> (defn bar [x] (* 2 x))
#'user/bar
user=> (defn foo [x] (+ (bar x) 13))
#'user/foo
user=> (foo 1)
15
user=> (defn bar [x] (* 3 x))
#'user/bar
user=> (foo 1)
16
user=> 
But this is the behavior we want in the REPL/while developing.

Matthew Pettis 2020-10-16T21:00:50.096500Z

Which is why it is really helpful now to be aware of that distinction. I really do think that helps me think about immutability, at least which things are and are not immutable. I've read that 'functions are values', but now I'm not sure what to think when that means that things in the call slot can point to different things. What makes a function a value if in the program, sub-components in the call-slot can get rebound and change the behavior of a function?

seancorfield 2020-10-16T21:01:05.096700Z

Note that there is also a compiler option called "direct linking" which effectively prevents the 'call-slot' indirection. Clojure itself is compiled that way, so you can't redefine core functions on the fly, but you can also compile all your own code that way (we do it as part of AOT-compiling our code when we build applications for deployment as uberjar files: but it does have the "downside" that you can no longer patch code running live in production via a REPL, which, yes, we do occasionally for one process where we do not compile it with direct linking).

👍 1
Matthew Pettis 2020-10-16T21:03:07.096900Z

yep, that makes sense to want that REPL behavior. I am definitely not a purist on function purity ( 🙂 ), but I just want to make sure I have a solid grasp of what immutability means for values when it comes to functions, and what that means when sub-components can change and change a functions behavior (when a function is a value).

Matthew Pettis 2020-10-16T21:05:03.097100Z

so, to be precise, in your foo/bar example above, I'd figure that the function that foo points at is an immutable value, as values are functions (right?), but by redefining bar, you changed the behavior of a function value...

seancorfield 2020-10-16T21:22:42.097400Z

Right, in particular, (defn bar [x] (* 2 x)) is shorthand for (def bar (fn [x] (* 2 x))) -- so (fn [x] (* 2 x)) is the value here (actually an object with an .invoke() method) and bar is a symbol that is bound to a Var and the content of that Var is a reference to the actual value.

seancorfield 2020-10-16T21:23:42.097600Z

Then when you have (defn bar [x] (* 3 x))) you get a new value (fn [x] (* 3 x)) and because bar is already bound to a Var, the content is updated to be a reference to the new value.

seancorfield 2020-10-16T21:24:19.097800Z

So bar's binding to the Var is essentially immutable, and each of the different function values are immutable. Only the Var itself is mutable.

seancorfield 2020-10-16T21:25:20.098Z

See https://clojure.org/reference/vars for a long description of Var and its siblings.

seancorfield 2020-10-16T21:26:46.098200Z

So def either creates a Var (with a reference to the value) and binds the symbol to it (if no such binding existed) or it just updates the Var to contain a reference to the new value.

Matthew Pettis 2020-10-16T21:32:02.098400Z

Makes sense. So, to restate what you said, the symbol -> var mapping is immutable (`bar` to a particular Var), and any function is a value, and further, any given Var can change what function (value) it points to. I think I get this all (via this discussion, thanks). The rest I guess is more philosophical, and not practical, but of interest to me -- it seems more appropriate to call functions 'values' in the case that you have 'direct linking' in force, when the behavior of a function value truly cannot be changed. Again, I am not at all versed in type theory to really grok values... I'm just trying to map out all of the online and book descriptions of what a value is to how it is being used here.

Matthew Pettis 2020-10-16T21:32:32.098600Z

In a very practical sense, however, though, really, the call-slot/value-slot distinction is a huge thing to have learned.

Matthew Pettis 2020-10-16T21:34:30.098800Z

One more clarification -- once def binds a symbol to a Var, it stays bound to that Var for the life of the program, correct? Except in the cases where Vars are shadowed with a binding form?

seancorfield 2020-10-16T22:56:03.099500Z

@matthew.pettis Sorry, was deep in code... unless you explicitly unbind the symbol in that ns, yes, the def binding stays bound to the same Var. You can ns-unmap a binding and you can also remove a namespace completely.

Matthew Pettis 2020-10-16T22:56:50.099700Z

no apologies necessary -- thanks, I forgot about ns-unmap...

seancorfield 2020-10-16T22:57:22.099900Z

There's a subtle issue around def vs defonce and reloading namespaces (`def` will recompute the value and update the Var on reloading a ns, defonce will not). Again, tho', you can still remove the ns to force defonce to recompute.

👍 1
seancorfield 2020-10-16T22:58:41.100200Z

And then binding changes the contents of the Var box (not the symbol binding) and restores it later (to the previous value) -- but there's a subtlety there in terms of thread local bindings etc.

seancorfield 2020-10-16T23:00:17.100400Z

And then there's with-redefs which affects multiple threads (and therefore is not thread-safe).

seancorfield 2020-10-16T23:00:41.100600Z

(`binding` can only be used with Vars that have been declared as ^:dynamic)

ozzloy 2020-10-16T03:26:03.037800Z

cool. good to have confirmation

ozzloy 2020-10-16T03:26:30.038600Z

yeah, i suppose leaving the #' doesn't hurt the reload and does allow for redefining at the repl

seancorfield 2020-10-16T03:26:34.038700Z

I see folks get into all sorts of trouble with reloading namespaces... 😞

ozzloy 2020-10-16T03:27:48.039400Z

makes sense

ozzloy 2020-10-16T03:30:11.041Z

i could see that being annoying to troubleshoot, and the behavior being surprising. something in figwheel caught me with reloading... i think it was with an on-click thing...

ozzloy 2020-10-16T03:30:47.041400Z

thanks for the responses, @seancorfield you've been helpful for me a few times

ozzloy 2020-10-16T03:49:56.043Z

if i have the same file in both resources and target, then the one from target wins. is it up to me to make sure i don't have 2 files with the same name on those different paths?

ozzloy 2020-10-16T03:50:15.043300Z

"same file" -> file with the same name

2020-10-16T04:01:05.045Z

It is up to you

2020-10-16T04:02:41.047500Z

Target is the kind of scratch space lein uses for generated stuff like the results of builds (jars, uberjar, etc) you shouldn't be putting anything in there

ozzloy 2020-10-16T04:05:38.048800Z

yeah, i'm worried about creating a file that gets shadowed by the build process and then wondering what i did wrong when the page doesn't load right. might be difficult to tell that that's what's going on when it happens

seancorfield 2020-10-16T04:17:54.050700Z

@ozzloy_clojurians_net lein is probably putting files from resources into target. I don't use lein any more so I never have to worry about target folder but, in general, never put anything in target yourself and then just ignore it 🙂

ozzloy 2020-10-16T04:20:20.051400Z

i'm not using lein either. figwheel.main puts stuff in target though

ozzloy 2020-10-16T04:21:01.051700Z

actually, idk what's putting stuff into target, but i think it's figwheel.main

ozzloy 2020-10-16T04:22:41.053Z

same thing applies to src and resources though. and relying on myself to know that there's a resources/a/b and a src/a/b is a known buggy process

ozzloy 2020-10-16T04:24:03.054700Z

but knowing that that is a potential thing is good enough for me. i was curious if there was some tool to address this. sounds like there is not. i can live with that

seancorfield 2020-10-16T04:24:30.055100Z

Are you using figwheel.main via the Clojure CLI then? https://figwheel.org/ shows lein and clj invocations (but I'm not doing ClojureScript so I haven't tried Figwheel -- I've only used shadow-cljs a bit).

ozzloy 2020-10-16T04:24:52.055500Z

yep

ozzloy 2020-10-16T04:26:15.056800Z

well... i'm using cider, so ... i think it's using clj under the hood. there's no project.clj and there is a deps.edn in my projects so far.

ozzloy 2020-10-16T04:27:46.057900Z

yep, cider calls "clojure"

dpsutton 2020-10-16T04:28:51.059700Z

how is this affecting you? (the actual command will be at the top of your repl). figwheel needs to compile your files. and for each namespace.cljs you'll end up with a namespace.js, namespace.js.map and namespace.cljs in the compiled out directory for figwheel to serve and hot load

ozzloy 2020-10-16T04:32:10.062200Z

@dpsutton at the moment, this is hypothetical. i don't currently have a resources/a/b AND a src/a/b. so ... it's affecting me by making me worry about a future me that has a hard to diagnose bug. and present me doesn't think that guy is likely to exist, so doesn't care too much.

dpsutton 2020-10-16T04:34:19.064100Z

not sure what to say. every cljs project will have compiled files which mimic the source tree during dev. it's never been a problem for me. those files are almost always gitignored (as compilation output should be) and therefore never a problem with ag/grep. Not sure what issue you think you'll have but i haven't had it in 4 years of clojurescript development

ozzloy 2020-10-16T04:39:22.068100Z

yeah, you're right. it will almost certainly not be an issue. and it's nice to have a concrete example of it never coming up in 4 years

J Crick 2020-10-16T06:23:47.069200Z

Thanks for all of the valuable feedback. I know my question was very broad, still... Some great resources and perspectives. I really appreciate it!

J Crick 2020-10-16T06:28:35.069300Z

Just looking over Clojure Applied. That seems to be right in line what I was looking for. Thank you!

Jim Newton 2020-10-16T07:23:24.071100Z

what is everyone using to profile your programs? i.e. to see where the time is being spent. My experience over 35 years of programming is that your usually surprised where the time is being spent.

Jim Newton 2020-10-16T07:25:12.072800Z

Jim Newton's 3 rules (of thumb) of programming: 1. every unoptimized program can be doubled in speed, (this rule is not recursive) 2. every untested program has a bug. this is especially true for one line programs. 3. some problems are really hard, but the problems are less hard if you take a break and have some ice cream.

🍦 3
Jim Newton 2020-10-22T06:29:14.345800Z

I'm trying to experiment with https://github.com/clojure-goes-fast/clj-async-profiler . The documentation says

;; The resulting flamegraph will be stored in /tmp/clj-async-profiler/results/
;; You can view the SVG directly from there or start a local webserver:

(prof/serve-files 8080) ; Serve on port 8080
If I start a web server, how can I view the graph? I'll need some URL right? which URL?

Jim Newton 2020-10-22T06:35:12.346100Z

@jumar what is OOM?

vlaaad 2020-10-22T06:38:55.346300Z

localhost:8080

vlaaad 2020-10-22T06:39:19.346500Z

I usually just open the /tmp/clj-async-profiler folder

vlaaad 2020-10-22T06:39:50.346700Z

OOM = out of memory

Jim Newton 2020-10-22T06:43:09.346900Z

oic

Jim Newton 2020-10-22T06:43:57.347100Z

So here is the file I generated. I don't really understand how to interpret the results. can someone help?

Jim Newton 2020-10-22T06:44:06.347500Z

what is it telling me?

Jim Newton 2020-10-22T06:47:49.347800Z

There's not much clue about why I'm getting out of heap space?

197/500: trying (:and :epsilon :empty-set :sigma :sigma)
198/500: trying (:or (:cat (:cat (:cat :empty-set) (:cat :empty-set)) (:* :epsilon) (:and (:not (:cat)) (:* (:not (:or))))) (:* (:or :epsilon (:cat (= a)))) (:not (:not (:or :sigma))) (satisfies decimal?))
199/500: trying (:cat (:+ (:not (:and (:and)))) (:or (:not :epsilon) (:not :epsilon) (:+ (:and (:cat)))) (:not (:cat (:or (:or)) :sigma)) (:or :epsilon (:or (:not :sigma) (:and (:or))) (:cat (:cat (:+ (:? (:cat)))) (:? (:? (:* (:? (:not (:* (:cat))))))))))
Execution error (OutOfMemoryError) at clojure-rte.util/fixed-point (util.clj:216).
Java heap space

clojure-rte.rte-core=> 

Jim Newton 2020-10-22T06:49:58.348Z

However, fortunately it does seem that clj-async-profiler does dump its results even if the expression being profiled encounters and OutOfMemeryError exception.

vlaaad 2020-10-22T07:01:44.348200Z

it’s a flame chart showing where CPU is being spent

vlaaad 2020-10-22T07:02:27.348400Z

there is a lot of GC in native code (around 90% of CPU)

vlaaad 2020-10-22T07:02:58.348600Z

tower on the left is your code, you can click on nodes to focus on them

Jim Newton 2020-10-22T07:04:07.348800Z

I wonder whether anyone might be keen to help me look into this? Particular someone who can run the tools on a non-mac?

git clone <https://gitlab.lrde.epita.fr/jnewton/clojure-rte.git>
cd clojure-rte
git checkout 7088418cb7078032309959ebfd01a88afbe7f380
lein repl
(require 'clojure-rte.rte-tester)
(clojure-rte.rte-tester/-main)

vlaaad 2020-10-22T07:04:08.349Z

I clicked in the middle of the tower so it showed only that, and it seems a lot of CPU in user code is spent in clojure.core/memoize

Jim Newton 2020-10-22T07:04:34.349200Z

hmm, even when I comment out all the calls to memoize, I still get out of memory error

Jim Newton 2020-10-22T07:05:49.349400Z

in my experience with various lisps, you need an allocation profiler to find what is overburdening the garbage collector. I suppose that's the same with clojure ?

vlaaad 2020-10-22T07:05:59.349600Z

more specifically, a lot of CPU regarding memoize is in clojure-rte.util/fixed-point

Jim Newton 2020-10-22T07:07:49.349900Z

I think the memoizing of fixed-point was an attempt to fix the problem. there is a recursive function which tries to keep simplifying an expression until it stops changing.

Jim Newton 2020-10-22T07:08:29.350100Z

I think this function often simplifies the same almost-leaf-level expressions .

Jim Newton 2020-10-22T07:12:59.350400Z

In my testing code I generate lots of random expressions, and give them to the simplifier. The thing that I see is that the more expressions I give, the more likely the OOM error is. Either my memoization is really causing the problem (which I doubt, because I see the same problem when I comment it out) or i'm allocating to much for the Java GC to keep up, which I also doubt because the java GC is arguably the best in the history of the world.

Jim Newton 2020-10-22T07:14:53.350700Z

After every simplification there should be no remaining allocation.

vlaaad 2020-10-22T07:15:29.350900Z

is there a possibility of endless loop of simplification that just grows and grows?

Jim Newton 2020-10-22T07:16:49.351100Z

this is indeed possible, and I've asked myself that. I dont think it is the case. Here is my reasoning. I print the expression before simplifying. when I get an OOM error, I can then simplify the offending expression in a fresh repl and it simplifies without problem.

Jim Newton 2020-10-22T07:18:23.351300Z

although putting a debug counter in the fixed-point function which warns or errors after 10000 iterations might not be a bad idea.

Jim Newton 2020-10-22T07:23:47.351500Z

I wrote the simplifier functions in a very wasteful way assuming the GC was good. I.e., In some cases it copies a list using (map ...) and then tests whether it got a different result. I could change those to first try to find an element of the list which would change under the map, and only then allocate a new list. This would mean I iterate over the list twice. ---- trading speed for space. I hoped to avoid that as it makes the code uglier.

Jim Newton 2020-10-22T07:24:24.351700Z

However, that is shooting in the dark, since I really don't know the culprit without an allocation profile.

Jim Newton 2020-10-22T07:34:24.351900Z

I am making an assumption after asking clojure experts. The assumption is that if i compute a function such as (fn ...) within a (let ...) which allocates lots of temporary variables whose values are huge allocations, but the returned function does not explicitly reference any of those huge allocations, then the GC will de-allocate them. For example.

(defn create-funny-function [n f g]
   (let [x (allocate-huge-list n)
         y (count (filter f x))]
     (fn [] (g y))))
In this case create-funny-function allocates a huge list and returns a function which only references an integer which is the length of that list, but does not reference the list itself. Will the GC deallocate x or not? I am supposing that it will.

Jim Newton 2020-10-22T07:42:47.352100Z

I should be able to construct an experiment to confirm this supposition. create longer and longer lists of the return values of create-funny-function until OOM, then do it again with a different size of allocate-huge-list See if the length of the list of function is shorter when allocate-huge-list is longer, that would falsify the claim. right?

jumar 2020-10-22T10:14:16.355700Z

What specific OOM error you get and how do you configure memory for the JVM (especially "Max heap" size). Did you try the flag I suggested? (`-XX:+HeapDumpOnOutOfMemoryError` - looks e.g. here: https://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss)

2020-10-22T14:59:38.372Z

JVM GCs are often quite good, but they cannot free memory that is still being referenced, of course. I believe you should only get OOM exception if there is more still-referenced memory than the configured max heap size when you started the JVM (or the default max heap size the JVM calculated by default when it started, if you did not specify one). I don't think any JVMs give that exception because you are allocating memory "too quickly" that it cannot keep up -- instead your program slows down when GC is running a lot.

2020-10-22T15:10:17.373500Z

Regarding your create-funny-fn above, I have not looked at the JVM byte code generated by the Clojure compiler for that to confirm, but I have looked at a version of that JVM byte code that was decompiled to Java source code (that gives less certain results, because in some cases the Clojure compiler produces JVM byte code that have no good representation in Java source code). It appears that the function (fn [] (g y)) is given references to g and y when the JVM object representing that function is created, but not to x. If that is true, then I do not see anything that could be holding on to a reference to x there.

2020-10-22T15:20:45.374Z

In your create-funny-fn, if you passed it a function f that was memoized, and memoized in a way that it never removed entries from its cache, then f's memoization cache size will grow proportionally to the number of distinct elements in the list x

2020-10-22T15:21:07.374200Z

and retain references to those elements of x

2020-10-22T15:24:27.374400Z

Again, I have not used it myself for this purpose, but YourKit Java Profiler advertises having a memory profiler that can help analyze the current non-GC'ed objects in a running JVM. Sure, it can be a pain to learn new tools, and it is difficult to know in advance whether they will end up saving you more time or costing you more time, ...

Jim Newton 2020-10-22T16:08:26.376700Z

@jumar, Re "Did you try the flag I suggested? (`-XX:+HeapDumpOnOutOfMemoryError"` , I don't know how to do what you're suggesting. Is that something in the project.clj file?

Jim Newton 2020-10-22T16:10:17.377200Z

@andy.fingerhut, my experimentation seems to confirm what you're saying. If I make the funciton also reference x, then memory fills up an order of magnitude quicker.

jumar 2020-10-23T09:27:21.398Z

If you run your app via java ... (as JAR e.g.) then you just do java -XX:+HeapDumpOnOutOfMemoryError ... With leiningen you can use :jvm-opts: https://github.com/technomancy/leiningen/blob/master/sample.project.clj#L295

Jim Newton 2020-10-23T10:10:42.400900Z

@jumar I installed the flag as you suggest, and then triggered the out-of-memory error. But I don't find any heap dump file anywhere.

2020-10-23T16:06:08.414700Z

Is your code published somewhere that others could try it out, e.g. to confirm whether the flag is set up in a way that it is actually being used when the JVM is started? Or have you used a command like ps axguwww | grep java while your process is running to confirm that the command line option is present?

2020-10-23T16:06:56.414900Z

I suspect most JVMs implement that option, but there are many different JVMs from different providers, and I don't plan to check if they all do. What is the output of java -version on your system?

Jim Newton 2020-10-23T19:53:03.423400Z

git clone <https://gitlab.lrde.epita.fr/jnewton/clojure-rte.git>
cd clojure-rte
git checkout 72fe231debcc45095a78fcecc07310bcbf725071
lein repl
(require 'clojure-rte.rte-tester)
(clojure-rte.rte-tester/test-oom)

Jim Newton 2020-10-23T19:53:41.423600Z

@andy.fingerhut i've tried to prepare a branch for you

Jim Newton 2020-10-23T20:01:53.423900Z

if you'd like to commit to this repo, I can give you permission. I just need your email address.

2020-10-23T20:02:04.424100Z

I tried those steps on a macOS 10.14.6 system, Leiningen version 2.9.3, AdoptOpenJDK 11.0.4, and it gives an error when clj-async-profiler tries to do some initialization. Not sure if you saw that already and worked around it, or perhaps you are using a different JDK version that works better with this.

2020-10-23T20:03:09.424300Z

Ah, I avoid that error with AdoptOpenJDK 8 -- will try with that.

Jim Newton 2020-10-23T20:09:26.425100Z

you can comment out the profiler code, you'll still get the oom error

Jim Newton 2020-10-23T20:10:15.425600Z

[geminiani:~/Repos/clojure-rte] jimka% lein repl
nREPL server started on port 64093 on host 127.0.0.1 - <nrepl://127.0.0.1:64093>
REPL-y 0.4.4, nREPL 0.7.0
Clojure 1.10.0
OpenJDK 64-Bit Server VM 11.0.7+10
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

2020-10-23T20:18:23.426700Z

Confirmed I get OOM with those commands after a while.

2020-10-23T20:18:45.426900Z

Will try adding the extra -XX option mentioned above, and maybe reduce the heap size a bit so the resulting heap before OOM is smaller

2020-10-23T20:23:53.427300Z

OOM exception occurred after I added that option, and it created a file named java_pid27470.hprof in the clojure-rte directory, i.e. the root directory of the project where I ran the lein repl command.

2020-10-23T20:24:20.427500Z

The only change I made was in the project.clj file, which I changed the line containing :jvm-opts to the following: :jvm-opts ["-Xmx512m" "-XX:+HeapDumpOnOutOfMemoryError"]

✔️ 1
2020-10-23T20:28:41.429Z

Even though I changed the max heap size to 512 Mbytes, the dump file created was nearly 1 Gbyte in size, probably due to some extra data it writes that isn't simply a copy of what is in memory at the time of the exception

2020-10-23T20:55:45.431400Z

It is pretty easy with a free tool like jhat to determine that most of the memory is occupied by objects with java.lang.Cons and java.lang.LazySeq, but I do not yet know if there is a convenient way to determine where in the code most of those objects are being allocated from.

2020-10-23T21:56:12.433700Z

Out of curiosity in trying to narrow down the place where OOM occurs, I tried putting a cl-format call inside of the function -canonicalize-pattern-once. It prints many times merely because of doing require on any of several of your namespaces. Is that intentional?

2020-10-23T21:57:00.433900Z

And it probably has nothing to do with the OOM, but it is pretty unusual to do as many namespaces that all in-ns and defn things in other namespaces. Not sure if you felt you needed that for some reason, or just prefer it for some reason.

2020-10-23T22:47:35.434700Z

Do you expect canonicalize-pattern to take the same amount of work on the same expression given to it no matter what calls have been made before? Or is there state left behind from each one that can affect how much future calls do?

2020-10-23T22:49:21.434900Z

I ask because I can run (clojure-rte.rte-tester/-main) in a REPL session, see the OOM and the output of this uncomment print statement in my copy: (cl-format true "canonicalizing:~%") , and copy and paste the last expression printed before the OOM, quit that REPL, start a new one, and do the following things only:

2020-10-23T22:50:07.435100Z

(require 'clojure-rte.rte-tester :verbose)
(require '[clojure-rte.rte-core :as co] :verbose)
(def maybe-troublesome-pattern3
  '(:cat :empty-set (member a b c a b c) (:cat :empty-set (:not :empty-set) (:* (:* (:and)))) :epsilon))
(co/canonicalize-pattern maybe-troublesome-pattern3)

2020-10-23T22:50:57.435600Z

and it typically return very quickly, with very few calls to canonicalize-pattern, whereas when it was going OOM, it makes millions of calls to canonicalize-pattern

2020-10-23T23:02:04.436200Z

It seems like for the same parameter value, in some situations calling canonicalize-pattern goes into infinite recursion, but in other situations, it does not.

2020-10-24T05:05:30.437500Z

Here is a file of REPL inputs I used on a slightly modified version of your code, with a little bit of extra code to count how many times a few functions were called, and print out debug output on every 100,000 calls to canonicalize-pattern, which before an OutOfMemory exception occurs, is called millions of times: https://github.com/jafingerhut/clojure-rte/blob/andy-extra-debugs2/doc/scratch.clj

2020-10-24T05:07:03.438Z

It seems that some inputs can sometimes cause canonicalize-pattern to call itself recursively with lists that approximately double in length, looking similar to this: https://github.com/jafingerhut/clojure-rte/blob/andy-extra-debugs2/doc/repl-session1.txt#L3004

2020-10-24T05:08:23.438600Z

Because of the random generation of inputs, it doesn't always happen in 500 runs, but it usually does. The common theme I saw when it does cause OOM is that the parameter to canonicalize-pattern is a list ending with many repetitions of the subexpression (:* :sigma), or something that contains that.

2020-10-24T05:09:11.439400Z

I do not know why your code can call itsef with lists that get about twice as long as one called recently -- hopefully this extra debug output might give you some ideas on where that might be happening in your code. Eventually the list gets so long that you run out of memory.

2020-10-24T05:11:24.439600Z

I made a copy of your repo on my http://github.com account, which you can clone and check out the branch I created with my additions here:

git clone <https://github.com/jafingerhut/clojure-rte>
cd clojure-rte
git checkout andy-extra-debugs2
lein repl

(require 'clojure-rte.rte-tester)
(clojure-rte.rte-tester/-main)

Jim Newton 2020-10-24T07:50:19.444300Z

@andy.fingerhut no doubt that is really useful information. the curious thing though is that even if certain lists trigger the out of memory error in the random tests, the same lists cause no such problem when I call it directly from the repl.

Jim Newton 2020-10-24T07:54:02.444600Z

about the multiple occurrences of (:* :sigma) there is a reduction step in the canonicalization code which recognizes these duplications and reduces them in different ways depending on which context they appear in. This is mathematically elegant, but in terms of computation effort may need to be refactored to reduce sooner, or avoid such generation in the first place

Jim Newton 2020-10-24T09:06:22.444800Z

I can also think about how to reduce the number of re-allocations of (:* :sigma) this is a pair which appears extremely often. rather than reallocating it in the code, I could define it as a constant and try to reference it rather than copying it.

2020-10-24T11:33:13.448100Z

I did not attempt to learn why sometimes an expression goes to infinite loop and sometimes finishes quickly. If you think the code is deterministic function of its input, clearly it is not, for some reason

Jim Newton 2020-10-24T11:42:25.448300Z

It is really curious to me that i get the OOM error trying to canonicalize an expression which canonicalizes very easily when I try it stand-alone. For example: On a recent atempt I got OOM on the expression: (:cat (:+ :sigma) (:? (:* (:cat (:and)))) (:or (:+ (:+ (:cat))) (:+ (:and :sigma)) (:cat (:? :epsilon) :sigma)) :epsilon) but when I call canonicalize-pattern-once 5 times it reduces to the concise form: (:cat :sigma (:* :sigma)) it really looks to me like something is filling up memory before it gets to 118/500

118/500: trying (:cat (:+ :sigma) (:? (:* (:cat (:and)))) (:or (:+ (:+ (:cat))) (:+ (:and :sigma)) (:cat (:? :epsilon) :sigma)) :epsilon)
Execution error (OutOfMemoryError) at clojure-rte.rte-core/canonicalize-pattern (rte_construct.clj:613).
Java heap space

clojure-rte.rte-core=&gt; (clojure-rte.rte-core/canonicalize-pattern-once '(:cat (:+ :sigma) (:? (:* (:cat (:and)))) (:or (:+ (:+ (:cat))) (:+ (:and :sigma)) (:cat (:? :epsilon) :sigma)) :epsilon))
(:cat (:cat :sigma (:* :sigma)) (:* :sigma) :epsilon)
clojure-rte.rte-core=&gt; (clojure-rte.rte-core/canonicalize-pattern-once '(:cat (:cat :sigma (:* :sigma)) (:* :sigma) :epsilon))
(:cat :sigma (:* :sigma) (:* :sigma) :epsilon)
clojure-rte.rte-core=&gt; (clojure-rte.rte-core/canonicalize-pattern-once '(:cat :sigma (:* :sigma) (:* :sigma) :epsilon))
(:cat :sigma (:* :sigma) :epsilon)
clojure-rte.rte-core=&gt; (clojure-rte.rte-core/canonicalize-pattern-once '(:cat :sigma (:* :sigma) :epsilon))
(:cat :sigma (:* :sigma))
clojure-rte.rte-core=&gt; (clojure-rte.rte-core/canonicalize-pattern-once '(:cat :sigma (:* :sigma)))
(:cat :sigma (:* :sigma))
clojure-rte.rte-core=&gt; 

2020-10-24T12:08:12.448700Z

I have seen the last expression before OOM occurred be at least 5 different expressions, and all of them I tried in a fresh REPL finished quickly. That behavior isn't unique to one input expression.

2020-10-24T12:32:09.450Z

Filling up memory before calling canonicalize-pattern would not explain why it seems to go into infinite loop though

2020-10-24T12:32:38.450900Z

It is creating longer and longer lists, doubling in size, when it goes on

2020-10-24T12:32:49.451300Z

Oom

2020-10-24T12:33:40.452500Z

When the same expression does not go oom it does not call itself with those expressions that double in size

2020-10-24T12:35:53.452700Z

I do not know the reason for the different behavior in different circumstances, but I noticed you are using binding , and often lazy sequences. Are you aware that those often combine unpredictably?

2020-10-24T12:40:00.452900Z

The definition of type-equivalent? might not be relevant for the behavior of canonicalize-pattern, but it seems to be written in a way that you know that subtype?is not a pure function. Why is it written that way?

2020-10-24T12:47:46.453100Z

Not sure if you prefer these things not to be pure functions of their inputs, but trying to make them so should make their behavior more predictable.

2020-10-24T13:26:50.453500Z

I have a strong suspicion that the cause and effect relationship here is NOT: "high memory usage causes canonicalize-pattern to behave badly", but instead "`canonicalize-pattern` calling itself with list lengths that double, indefinitely, leads to high memory usage".

2020-10-24T13:28:37.453700Z

I can write a trivial function that calls another function with a list of length 1, then 2, then 4, then 8, etc., and you would easily reason "don't do that, you will run out of memory". Determine why canonicalize-pattern sometimes does that, and prevent it, and you will almost certainly solve the OOM problem.

Jim Newton 2020-10-24T16:12:42.454100Z

What do you mean by subtype? is not a pure function? Do you mean the fact that it binds the *subtype?-default* function? I probably could refactor that away now that I understand better the problem I was originally trying to solve. The issue is that sometimes it cannot be determined whether a subtype relationship holds. the 3rd argument specifies what to do, whether to return true, or false, or raise an exception. The caller of subtype? must specify which action to take. Off my head I can't think of any reason canonalize-pattern would be calling into type-equivalent? or subtype?

Jim Newton 2020-10-24T16:13:56.454400Z

yes, if I can identify why canonicalize-pattern is doubling the length of its argument on recursive calls, that would indeed sound like an error.

Jim Newton 2020-10-24T16:37:06.455Z

Yes, there are indeed scenarios where dynamic variables do not play well with lazy sequences, and vice versa. I've tried to unlazify the lazy sequences, but I'm sure I've missed some of them. Clojure tries really hard to make sequences lazy.

2020-10-24T16:48:09.455200Z

(doall &lt;expr&gt;) is one general purpose way to force any &lt;expr&gt; that returns a lazy sequence, to realize all of its elements eagerly, without having to define separate eager versions of functions like filter, map, etc.

2020-10-24T16:50:09.455400Z

Regarding my comment about subtype?, look at your definition of type-equivalent?. It calls subtype? twice with the same parameters, once with delay wrapped around it, once without, and then compares the return values of the two. If subtype? were pure, there doesn't seem to be any point to doing something like that. Did you write type-equivalent? believing that subtype? does not always return the same value given the same arguments?

2020-10-24T17:08:04.455600Z

The :post condition expression in your function subtype? will always be logical true, because it is an expression that returns a function, and functions like all other things that are neither nil nor false are logical true in Clojure, so that :post condition will never assert, ever.

2020-10-24T17:09:22.455800Z

Closer to what you probably intended would be :post [(#{true false :dont-know} %), but that is also probably not what you want, because looking up false in a set like that returns the found element, false, and that would cause an assert exception for failing the post-condition.

2020-10-24T17:10:31.456Z

Likely what you actually want there, if you ever want it to catch returning a value other than true, false, or :dont-know, is :post [(contains? #{true false :dont-know} %)]

2020-10-24T17:12:13.456200Z

That is unlikely related to your OOM issue -- just something I noticed while looking around.

Jim Newton 2020-10-25T09:53:24.474500Z

When I look at the :post function, maybe I don't understand the semantics of :post. What I'd like to do is assert that the value returned from subtype? is explicitly true, false, or :dont-know. I think I may be confused about using sets as membership tests. I'll change the post function to:

(fn [v] (member v '(true false :dont-know) v))
I already have a member function in my utils library defined as follows: perhaps I should replace the final call to some with a (loop ... recur) which checks equivalence until it finds one? I suspect that would be faster than rebuilding a singleton set, and then checking set membership many times. I suspect a small (loop ... recur) would compile very efficiently? right?
(defn member
  "Determines whether the given target is an element of the given sequence."
  [target items]
  (boolean (cond
             (nil? target) (some nil? items)
             (false? target) (some false? items)
             :else (some #{target} items))))

Jim Newton 2020-10-25T09:57:20.474700Z

@andy.fingerhut You commented: It calls `subtype?` twice with the same parameters, once with `delay`wrapped around it, once without, and then compares the return values of the two. Thanks for finding that. I believe that is a bug. ITS GREAT to have a second set of eyes look at code. It should call subtype? within the delay with the arguments reversed. I.e., two types are equivalent if each is a subtype of the other. But don't check the second inclusion if the first is known to be false because such a call may be compute intensive and unnecessary. Looks like i'm missing something in my unit tests. :thinking_face: The semantics of type-equivalent? are if either of s1 or s2 are false, then return false (types are not equivalent). If both s1 and s2 are true, then return true. Otherwise call the given default function and return its return value if it returns.

Jim Newton 2020-10-25T10:50:35.480Z

@andy.fingerhut WRT your comment: >>> (doall &lt;expr&gt;) is one general purpose way to force any `<expr>` that returns a lazy sequence, to realize all of its elements eagerly, without having to define separate eager versions of functions like `filter`, `map`, etc. I don't completely follow. my eager versions of filter, map etc simply call doall as you suggest. Are you suggesting that it's better just to inline the call to doall ? As a second point, I don't think do doall really forces all lazy sequences, rather only the top level one. For example if I have a lazy sequence of lazy sequences, then as I understand doall will give me a non-lazy sequence of lazy sequences. Unless I misunderstand, If I want to use dynamic variables, then I have to fall everywhere in my code which is producing a lazy sequence and somehow force it with doall.

2020-10-25T14:48:24.484400Z

You are correct that doall forces the top level sequence, not nested ones.

2020-10-25T14:50:41.484600Z

Set literals are constructed only once by the compiler's generated code, if they contain only constants, I believe. I would expect set containment to be faster than either member or some or an explicit loop, since sets use hash maps to check for membership and thus do not iterate over all elements, but for a 3-element set I doubt you will notice much difference in the context of your application.

2020-10-25T14:54:19.484800Z

I've done a few experiments putting debug prints in a few places here and there trying to determine why the code sometimes creates lists that get twice as long, but I don't have any good clues yet. I doubt I will spend much more time on it. I suspect there is some kind of mutable data structure being used somewhere, but that is just a guess without evidence.

2020-10-26T00:02:17.496600Z

I may have found the root cause of the problem: Your implementation and use of ldiff relies on identical? for equality testing of two lists, but Clojure's sequences are not Common Lisp sequences of cons cells.

2020-10-26T00:03:17.496800Z

Lazy sequences consist of objects that can be mutated in place, but depending upon the operations you do on them, all you should really ever count on is =-value-equality, or you are asking for subtle problems, IMO.

2020-10-26T00:04:33.497Z

Clojure = does have a short-cut quick test that if two given objects are identical?, it quickly returns true

2020-10-26T01:00:11.497600Z

Thus your attempted use of first-repeat, ldiff, and concat to either remove one element from a list, or return the same list, can actually return a longer list than given.

2020-10-26T01:19:20.497800Z

Here is a proposed fix that avoids the use of ldiff, instead using a function dedupe-by-f that I wrote by making small changes to Clojure's built-in dedupe function: https://github.com/jafingerhut/clojure-rte/commit/6ee771559e344ed612be0b20cd2ba7bbc46dec79

2020-10-26T01:19:40.498Z

I have run -main with 500 random tests multipe times with no OOM exception, with only those changes.

2020-10-26T01:20:31.498200Z

At least starting from your commit 50683709b7fa29ea7b53fe607a7376a7b1c32bb2, not your latest code. But it probably applies just as well to your latest version.

2020-10-26T01:23:33.498400Z

In general, I would think three or four times, very carefully, before ever relying on identical? in Clojure. There might be cases in Java interop where you need to know whether two JVM objects are the same object in memory, but in Clojure about the only time I recall seeing it used to good effect is to create a unique "sentinel" object that is guaranteed not to be identical? with any other existing object, and then using identical? to check whether that sentinel object was returned, as a "not found" kind of situation.

2020-10-26T02:25:24.498600Z

I have not done anything to examine the code in your file bdd.clj other than to search all of your source files to look for other occurrences of identical?. They might be perfectly safe as you use them there (not easy to tell from a quick glance), or they might have the same danger of bugs lurking there, too.

Jim Newton 2020-10-26T07:33:55.499300Z

@andy.fingerhut Re: Thus your attempted use of `first-repeat`, `ldiff`, and `concat` to either remove one element from a list, or return the same list, can actually return a longer list than given. This is really interesting. I don't get exactly the same results as you, but your suggestion seems to improve the situation. I don't completely understand the issue with ldiff . I'm curious whether you might be able to give an example of where it fails? (more below...) However, when I replaced the call to identical? with a call to = (as you suggested) one of my OOM errors went away, but there are other tests which still exhibit the OOM error. However, w.r.t the following code which uses ldiff when I also replaced concat with concat-eagerly other OOM errors in my test suite seemed to go away, but when I run the tests again and again, they still occur.

(let [ptr (first-repeat operands (fn [a b]
                                   (and (= a b)
                                        (*? a))))]
  (if (empty? ptr)
    false
    (let [prefix (cl/ldiff operands ptr)]
      (cons :cat (concat-eagerly prefix (rest ptr))))))

Jim Newton 2020-10-26T07:42:07.499600Z

more about ldiff, The purpose of ldiff, you may already know, is that if you've already identifies a tail of a list which verifies some predicate, you want to copy the leading portion of the list. In the case of cons cells, you can retrace the list doing pointer comparisons. You're claim is that in the case of lazy lists, these pointer comparisons won't work. Is this because the tail might be some sort of lazy object, and evaluating it modifies the sequence, replacing the tail with a different object which is no longer identical? to the previous tail? I'd love to see an example.

Jim Newton 2020-10-26T08:00:01.499800Z

does backquote create a lazy sequence?

Jim Newton 2020-10-26T08:13:27.000100Z

I've updated my member function to work on non-lists

(defn member
  "Determines whether the given target is an element of the given sequence."
  [target items]
  (boolean (cond
             (empty? items) false
             (nil? target) (some nil? items)
             (false? target) (some false? items)
             :else (reduce (fn [acc item]
                                   (if (= item target)
                                     (reduced true)
                                     false)) false items))))
arguably, I probably really only need the call to reduce, not the call to boolean, cond, some, nil?, false? and empty?

2020-10-26T11:44:48.003600Z

I have a different suggested change that does not use ldiff at all, but rather removes an element from a list with a different approach completely. You could try that to see if you still get OOM errors.

2020-10-26T11:45:04.003800Z

The change I linked earlier that uses a new function dedupe-by-f

2020-10-26T11:54:19.004100Z

I do not yet have a short example that shows ldiff returning a surprising value because it uses identical? , but it definitely very repeatably did in the context of running -main

2020-10-26T12:13:26.005800Z

Turns out I was able to find a small example of ldiff behaving not as desired with identical?:

;; This commit SHA is in Jim Newton's clojure-rte repository
;; and is the one just before he made a change to the ldiff function
;; to use = instead of identical?
;; git checkout 302628d04af207d4969396533456376b4c80e263

(require '[clojure-rte.cl-compat :as cl])
(require '[clojure-rte.util :as util])

(def l1 '[(:* :a) (:* :b) (:* :b) (:* :b) (:* :a)])

(defn rm-first-repetition [coll pred]
  (let [ptr (util/first-repeat coll pred)]
    (if (empty? ptr)
      false
      (let [prefix (cl/ldiff coll ptr)]
        (println "prefix=" prefix)
        (concat prefix (rest ptr))))))

(rm-first-repetition l1 =)

;; Running the sequence of expressions above in a fresh REPL, I see:
;; prefix= [(:* :a) (:* :b) (:* :b) (:* :b) (:* :a)]
;; ((:* :a) (:* :b) (:* :b) (:* :b) (:* :a) (:* :b) (:* :b) (:* :a))

2020-10-26T12:15:23.006Z

The issue with using identical? there is not only because of lazy sequences -- it can probably occur if the collection you are dealing with is anything except a list of Cons cells. You can have lists of Cons cells in Clojure, but they tend to occur only if you know you are explicitly constructing them in your code, and sequences produced by core functions like filter, map, etc. typically do not.

2020-10-26T12:20:41.006200Z

With the latest version of your clojure-rte repository (commit fde964831c9907e7b87f791afeb43f18686d5f56, which is after you modified ldiff to use =, plus other changes), macOS 10.14.6, Oracle JDK 1.8.0_192, I can do lein repl, (require 'clojure-rte.rte-tester), then (clojure-rte.rte-tester/test-oom) 20 times in a row, with no OOM occurring, even if I change the project.clj file to use -Xmx64m instead of the -Xmx1g you have there now.

Jim Newton 2020-10-26T14:08:37.056800Z

Ahhh, is the problem that util/first-repeat is assuming its input is a list, and in that case returns a cons cell, but it its input is a vector, it will return a copy of a tail of the vector?

Jim Newton 2020-10-26T14:18:15.057Z

Can I do the computation easier? What I want to do is ask whether there are two consecutive elements of a sequence which obey a given binary predicate. If so, remove exactly one of them, and if not return false ???

Jim Newton 2020-10-26T14:21:30.057400Z

This is what my code currently does. In the code operands is the tail of a sequence which has already been verified to start with :cat , that's why I cons :cat back on at the end.

(let [ptr (first-repeat operands (fn [a b]
                                    (and (= a b)
                                         (*? a))))]
   (if (empty? ptr)
       false
       (let [prefix (cl/ldiff operands ptr)]
         (cons :cat (concat-eagerly prefix (rest ptr))))))

Jim Newton 2020-10-26T14:38:59.058Z

Here's what I'm trying. I think it could probably be done more cryptically but more efficiently computation-wise with a call to reduce/reduced

Jim Newton 2020-10-26T14:39:12.058200Z

(defn remove-first-duplicate [test seq]
  (loop [seq seq
         head ()]
    (cond (empty? seq)
          false

          (empty? (rest seq))
          false

          (test (first seq) (second seq))
          [(reverse head) (rest seq)]

          :else
          (recur (rest seq)
                 (cons (first seq) head)))))

2020-10-26T15:01:03.058600Z

I sent a link with a proposed change earlier that does not use ldiff at all, but instead a new function dedupe-by-f that is a modified version of Clojure's built-in dedupe. Here is the link to that proposed change again: https://github.com/jafingerhut/clojure-rte/commit/6ee771559e344ed612be0b20cd2ba7bbc46dec79

2020-10-26T15:01:37.058800Z

There are more straightforward ways to write dedupe-by-f that are easier to understand for me and most Clojure readers than the way dedupe is implemented (using transducer machinery).

2020-10-26T15:04:04.059Z

Your remove-first-duplicate is one way. Most Clojure programmers do not typically use the accumulate-and-reverse technique, because Clojure vectors make it efficient to append things at the end, but it looks perfectly correct, except I think you should return not [(reverse head) (rest seq)] but (concat (reverse head) (rest seq))

2020-10-26T15:06:26.059200Z

Interactively writing small test cases for new functions can help quickly catch things like that, versus trying to debug them in the context of the entire application

Jim Newton 2020-10-26T15:10:22.059400Z

Yes. I'll write some test cases for remove-first-duplicate before I try to refactor it.

2020-10-26T15:10:55.059600Z

It is simple enough that 2 or 3 small test cases should give good confidence it is working

Jim Newton 2020-10-26T15:11:47.059800Z

question about appending to the end. Is appending the the end of a vector n times a linear operation or n^2 operation in clojure? because cons -ing to the beginning and reversing is definitely a linear operation. Right?

Jim Newton 2020-10-26T15:12:52.060Z

The problem, I think, in my previous implementation wasn't really a problem of testing the small function. In fact I explicitly wrote the function to work on lists, but used it on non-lists in the context of ldiff.

Jim Newton 2020-10-26T15:14:35.060200Z

@andy.fingerhut. Many thanks for all your help. That's kind of you.

2020-10-26T15:16:59.060400Z

Like Common Lisp, cons-ing on the beginning of an accumulator and reversing is definitely linear.

2020-10-26T15:17:53.060600Z

conj-ing to the end of a vector is O(1) 31/32 of the time, and O(log_32 N) 1/32 of the time, so O(log_32 N), but often called "effectively constant time", in a loose sense.

2020-10-26T15:19:20.060800Z

I was surprised by the behavior of ldiff once I found out that was the cause, and only realized that identical? is the likely cause of trouble from years of Clojure experience, with the end result the summarized statement that identical? is almost never what you want to use.

2020-10-26T15:20:41.061Z

One of the issues with having a language like Clojure that has so much clear influence in its design from Common Lisp, is the differences it (intentionally by design) has that can trip up someone with Common Lisp deep knowledge.

Jim Newton 2020-10-26T15:21:27.061200Z

we learn by doing things the wrong way. we=mankind. it's painful, and some don't survive.

2020-10-26T15:21:46.061400Z

I probably spent a bit too long figuring out the ldiff thing -- I can get a bit obsessed sometimes when a problem gets me wondering what is going on.

2020-10-26T15:23:10.061600Z

The long part wasn't figuring out why ldiff and how it was used might cause the problem -- it was narrowing the problem down to that part of the many lines of code

Jim Newton 2020-10-26T15:23:19.061800Z

ldiff is a rarely used CL function, but when it is useful, it is quite useful.

Jim Newton 2020-10-26T15:24:21.062Z

On the other hand, I'm still wondering whether the dynamic variables are causing problems with the lazy sequences.

2020-10-26T15:25:24.062200Z

Most of your use of binding that I looked at seemed to be binding the same values that the dynamic Vars had already, before binding . Those should not cause problems. It is when binding binds a different value.

2020-10-26T15:25:37.062400Z

But I did not exhaustively check every use of binding that way.

2020-10-26T15:26:56.062600Z

If you don't mind passing extra explicit parameters around, instead of using dynamic vars, it can be annoying in the extra parameter 'plumbing', but lets you use lazy sequences without worrying about that interaction.

2020-10-16T07:27:13.072900Z

I do not now how common it is relative to other profiling tools, but I have heard good reviews of YourKit Java Profiler: https://www.yourkit.com/java/profiler/features/

2020-10-16T07:28:07.073200Z

I do not believe it has anything Clojure-specific built into it, so you will need a bit of practice in figuring out how to parse the names of JVM classes created by the Clojure compiler.

2020-10-16T07:28:54.073400Z

They have reduced pricing options (and maybe free?) for open source and educational developers

vlaaad 2020-10-16T07:57:03.073900Z

https://github.com/clojure-goes-fast/clj-async-profiler

👍 1
Jim Newton 2020-10-16T10:32:55.076600Z

@vlaaad, it looks like it's better to put this depenency in ~/.lein/profiles.clj rather than in the project project.clj , right? because it is only supported on Linux and Mac, not windows. Having the dependency in my project would prevent any windows user from loading the project. Is that correct?

Jim Newton 2020-10-16T10:36:34.076800Z

clojure-rte.rte-core=&gt; (require '[clj-async-profiler.core :as prof])
Execution error (FileNotFoundException) at clojure-rte.rte-core/eval13090 (form-init3039314401959532405.clj:1).
Could not locate clj_async_profiler/core__init.class, clj_async_profiler/core.clj or clj_async_profiler/core.cljc on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.

Jim Newton 2020-10-16T10:44:24.077Z

I https://github.com/clojure-goes-fast/clj-async-profiler/issues/19 for help.

vlaaad 2020-10-16T11:37:20.077300Z

Yeah, I would say all dev-only stuff belongs to user profiles outside of repo. You use your tools, I use my tools, we don’t have to dump them all in the project we work on.

Mario C. 2020-10-16T18:25:24.091400Z

A few days ago I asked about OutOfMemory Metaspace exception and what could be the cause of this issue. I was pointed in the general direction that the code could be using eval. After a few days of research, I believe I know what is going on but I am not quite certain if I am right. To give some context, we have an API that takes in inputs and runs the inputs on a model. The model does some calculations and returns it results. These models are written in a DSL but at the end of the day it's just Clojure code (A bunch of macros). These models are not in the same project as the API, they are in their own project. When we deploy, we package up the src/models directory into a .tar.gz and upload to an s3 bucket. In the jenkins job, that tar file is downloaded and unpacked and moved into the /resource directory. We create a war file and its uploaded to s3. When a request comes in, it contains inputs + a model-name. The API looks at the model-name and then looks up the file (A .clj file) and slurps the contents and loads into the process via load-string . I downloaded the war file and examined it and noticed the API's code are .class files in the WEB-INF/classes/ directory but the models code are still .clj files. Does this sound like something that would essentially act like eval but on a much greater scale? Causing the metaspace memory to get full and barf?

Mario C. 2020-10-16T18:27:42.092600Z

Another questions, If I call load-string, on the same clojure file, twice, does that cause two different classes to be created? Even though the file is the same?

Ronny Li 2020-10-16T18:39:49.093500Z

I've been following the figwheel https://figwheel.org/docs/npm.html for using NPM modules and it says I should write my `dev.cljs.edn` as follows:

{
 :main demo.core
 :target :bundle
 :bundle-cmd {:none ["npx" "webpack" "--mode=development" :output-to "-o" :final-output-to]}
 }
When I build dev `clj -A:dev` I see that `:output-to` and `:final-output-to` are replaced with these values:
[Figwheel] Bundling: npx webpack --mode=development target/public/cljs-out/dev/main.js -o target/public/cljs-out/dev/main_bundle.js
This seems weird to me because my `index.html` looks for `dev-main.js` (using hypen instead of slash)
&lt;script src="cljs-out/dev-main.js"&gt;&lt;/script&gt;
Does anyone know why this discrepancy is happening?

Jim Newton 2020-10-16T18:48:34.093800Z

Is the model of async-profiler that I have to run something which finishes cleanly in order to profile it. One problem I have is that some of my randomly generated test-cases occasionally run out of memory and aborts. If I run this code within (prof/profile ...) will the profiler show me anything?

2020-10-16T18:51:59.094Z

If I understand correctly, clj-async-profiler is for profiling function/method calls that end. I believe that YourKit lets you attach to a running JVM at any time and start collecting performance data from it on both memory and run-time, and can do so while the code is running, even if it eventually ends up crashing, running out of memory, or into an infinite loop.

aaron-santos 2020-10-16T21:56:35.099100Z

I benefitted from YourKit's license for open source projects. The flamegraph functionality was very helpful in figuring out which parts of my Clojure project were slow.

sova-soars-the-sora 2020-10-16T23:30:20.101600Z

yeah hyphens get destroyed sometimes for file names, you may need to use an underscore / underline. Someone more knowledgeable might have a better answer

sova-soars-the-sora 2020-10-16T23:30:44.102100Z

i don't know exactly why but it is somewhat common and can be a head scratcher if you've not seen it before, could be what's going on there

2020-10-16T23:36:18.102200Z

Clojure does not attempt to try to remember what the previous contents of a file were, and see if it is changed, or the same.

2020-10-16T23:36:46.102400Z

When you load code containing defn forms, new classes are created.

2020-10-16T23:38:35.102600Z

load-string and load-file are similar in that they are effectively calling eval on every top level form of the loaded string/file.

2020-10-16T23:40:37.102800Z

If in the context of your application, you know that some strings that you call load-string is identical to one you did load-string on earlier, or if you know a file in the file system has not changed, you could perhaps detect that and avoid redoing load-string on the contents of that file.

2020-10-16T23:41:23.103Z

Depending upon what you allow in those files, avoiding doing load-string on a file's contents might cause problems.

2020-10-16T23:43:00.103200Z

For example, if a file contained top level forms like (def foo 1) and then later might do alter-var-root! on foo to change its root binding, then the final value of foo can be something other than 1. Doing load-string again on that file will repeat the evaluation of (def foo 1), changing foo's value back to 1.

2020-10-16T23:44:04.103400Z

There could also be top level forms like (def my-atom (atom 1)), and then later swap! calls on my-atom that change the value stored within that atom. Later doing a load-string again on the contents of that file, even if it has not changed, will redefine my-atom to point at a freshly created atom that contains 1 again.

2020-10-16T23:44:49.103600Z

If you for some reason know that these kinds of things never happen, and doing load-string repeatedly on a file should end up in the same state, and running the code won't mutate any of that state, then you can safely skip doing load-string on the same file more than once.

2020-10-16T23:45:50.103800Z

I do not know of any static checker you could run on such a file that would be able to tell you whether a file does such mutations, or not. Doing so in general is not a thing that one can write a program to solve (it is as hard as the halting problem).

2020-10-16T23:47:32.104Z

@mario.cordova.862 ^^^