^ What Andy said above - especially for very short strings you can have a lot of overhead when compared to their "size" in the file. Also hashmaps has a lot of overhead per element: https://github.com/plumatic/eng-practices/blob/master/clojure/20130926-data-representation.md
@ericihli Here is a transcript of a Clojure session that you can try out on your own machine for analyzing the in-memory size of Clojure data structures compared to their EDN representations, for small parts of your Clojure data: https://github.com/jafingerhut/cljol/blob/master/doc/README-clojure-map-size.md
It averages about 10 bytes of memory per character of EDN on the JVM version I tested with. There is a utility called cljol I wrote that you can see example use of in the file I linked that is good for showing the JVM objects and sizes for small Clojure collections.
That 10 bytes of memory per character of EDN is for your data file -- it could vary for different classes of data structures represented in EDN. For example, long keywords with many repetitions of the same keyword will share the storage of that one keyword name in Clojure/JVM.
i'd like to count the "next-consecutive" word of each from a given coll, this reduce-code will do, is this the "clojure-way" of achieving such, or any other "high level" methods (no reduce: merge-with, map, etc) would be more "idiomatic"?
(->> ["A" "B" "A" "D" "A" "D"]
(partition 2 1)
(reduce (fn [res [fst scd]]
(if (get res fst)
(update-in res [fst scd] (fnil inc 0))
(assoc res fst {scd 1})))
{}))
=> {"A" {"B" 1, "D" 2}, "B" {"A" 1}, "D" {"A" 1}}
You only need update
inside reducing function:
(->> ["A" "B" "A" "D" "A" "D"]
(partition 2 1)
(reduce (fn [res [fst scd]]
(update-in res [fst scd] (fnil inc 0)))
{}))
I would probably do it the same way
Given that I have to work with an existing Java library that expects a public Foo __myfoo
on each class I want to work with, can I somehow transform bytecode for a defrecord-created class to support this use-case?
Which high-speed alternatives do we have to EDN for serialization and deserialization of clojure data structures?
Oh this is odd. I have 2 local projects using clojure's cli deps.edn
. B depends on A with {:local/root "/.../lib-A"}
. Yesterday, I was able to start a REPL from lib-B. Today I'm getting an error that one of lib-A's dependencies ( nippy
, one I just added today) can't be found on the classpath when I try to start a REPL for lib-B. Looking at the output lt clj -Spath
in lib-B, I see every other dependency that is in A and not in B. I just don't see this newly added one. Is there some sort of cache that is used and the CLI doesn't pick up on new dependencies linked together with :local/root
?
high-speed probably means binary?
maybe fressian is what you want? https://github.com/clojure/data.fressian
https://groups.google.com/g/clojure/c/9ESqyT6G5nU/m/2Ne9X6JBUh8J Fressian (or nippy https://github.com/ptaoussanis/nippy) for fast.
Hm. https://clojure.org/reference/deps_and_cli#_operation
> Check cache
> The next several steps of this execution can be skipped if we have already computed the classpath and cached it. Classpath and the runtime basis files are cached in the current directory under .cpcache/
. The key is based on the contents of the deps.edn files and some of the command arguments passed and several files will be cached, most importantly the classpath and runtime basis.
I deleted .cpcache
and REPL started fine next run.
The Clojure CLI tools do calculate and store cached classpaths, putting them into files in a .cpcache directory. You can delete that directory and try again, or IIRC use -Sforce
to cause it to ignore the cache.
transit is used frequently and widely I would say the lack of “activity” is because it is a specification, not an implementation The clj impl is here https://github.com/cognitect/transit-clj also, many libraries are ‘quiet’ when they’re stable and working well 🙂
IIRC nippy is fastest, but generates a lot of garbage, followed by Transit (built on top of Jackson), and finally Fressian.
no, although you can do this with deftype
the cache is not sensitive to changes in local/root project deps so you will need to use -Sforce
any time those change
thanks!
so garbage = increased data size?
No, they'll produce the same data in the end, but it seems like nippy might allocate a lot more intermediate objects. I don't think it's significant for small payloads but I saw people complain about it for big payloads (10s of MB upwards). It's best to benchmark in a running application to be sure. GC pressure can give you trouble
I read the details of an issue someone opened regarding the generated garbage, I don't think I'd worry about it
@ben.sless ok thanks!
is there a function to (dissoc obj :key)
that returns both the new obj
without :key
and also the value that was held by :key
? (let [[new-obj value] (dissoc2 obj :key)] ...
(juxt dissoc get)
oh of course, thank you
What would cause this unexpected behavior?
lib-A depends on lib-B with :local/root ".../lib-B"
.
lib-B repl starts up in about 5 seconds and then evaluating the require
s of lib-B that lib-A depends on takes about 40 seconds (they deserialize some big files from disk).
lib-A repl starts up in about 5 seconds if I remove the references to lib-B. With the references to namespaces in lib-B, the namespaces that take ~40 seconds to load when running the repl from lib-B, it takes about 6 minutes for a lib-A repl to even start.
Looking at the CPU sampling in a JVM profiler, I see the 6-minute load times happening when starting the slow lib-A repl inside the main
thread and the 40-second load times when starting the faster lib-B repl inside the nREPL-session-....
thread.
Is this related to the local/root
dependency? Does clj
evaluate every namespace of the dependency at bootup rather than only evaluating namespaces when they are require
d?
I'll try jaring lib-B and see if behavior changes if I switch away from the local/root dependency.
What version of clojure?
There is (was?) a bug in some jvm version where code load from static init blocks (which is how clojure code tends to be loaded) wouldn't ever get jitted, which made load times of macro heavy code really slow
Clojure has some mitigations for this, but only after some version I don't remember
1.10.1 has the mitigation
background reading -- but yeah if you have a user.clj beware
and btw various changes have landed in subsequent jvms, but Clojure 1.10.1+ is good to use regardless to buffer you from some of that stuff
Ah. I'm on 1.10. I'll try bumping it. Thanks!
@bronsa So looking more at indexing-push-back-reader
(and source-logging-push-back-reader
), it seems like it doesn't actually annotate the read expressions with source locations, unless I'm missing something.
There is get-row-number
and get-column-number
, but that seems to only return the current position of the reader.
So for something like:
(+ 1
(+ 3 4))
It wouldn't be possible to get the source location of the inner (+ 3 4)
expression.Or am I missing something?
i thought the reader used that information to mark the structures
So, I have the following in my repl:
> (def r (indexing-push-back-reader "(+ 1 2)
(+ 3 4)
(+ 5 6)")
> (def x (read r))
> x
(+ 1 2)
@dpsutton Am I doing something wrong? (Or are there non-printed attributes?)
Metadata is not printed by default. You can explicitly ask for metadata with the eponymous function or if you apropos for meta you’ll see there’s a dynamic var which will print it by default
On mobile so can’t give examples or links sorry
@dpsutton No worries. I can google for eponymous
thanks
the function is called meta
, that's why he called it eponymous
@noisesmith Ah, okay, thanks.
AH, got it...thanks
> (meta y)
{:source "(+ 1 2)", :line 1, :column 1, :end-line 1, :end-column 8}
Sorry about the confusing phrasing :) and if you evaluate (apropos “meta”) it will show you a few more functions and that dynamic var I was talking about
@dpsutton So, I hadn't know about the function apropos
until now, that's very helpful, thanks.
Check out (dir clojure.repl) I think these functions are an underused way to really speed up your knowledge of clojure
> (dir clojure.repl)
nil
OHH
OKay, actually, taht was clojurescript]
Doing it in a proper clojure repl did work.
Thanks 🙂
This chat about :line
and :column
reminds me of an old curiosity I forgot to ask about. I noticed that positional metadata gets attached to lists and was wondering why:
> clj
Clojure 1.10.1
user=> (meta '(1 2))
{:line 1, :column 8}
user=> (meta '[1 2])
nil
to do error reporting
on form locations
ah! so it is on purpose! thanks @alexmiller!
So, I have a style question now: In Racket, I can write the following code:
(let loop ([s <seq>])
(for ([i s])
(when <condition>
(loop i)))
But in clojure, as far as I can tell, the loop
macro cannot use recur
inside of a doseq
. So the only equivalent I can think of is:
(letfn [loop (fn [s]
(for [i s]
(when <condition>
(loop i))))]
(loop <seq>))
And I would like to know if there's any better (as in cleaner) way to do this?(I guess I 'could' just make my own named let macro, but that seems like a bit of a nuclear option.)
@leif Note that metadata cannot in Clojure be associated with keywords, numbers, or strings (and perhaps a few other things I'm forgetting), because Clojure uses the JVM types for those without changing them to be able to include metadata. Clojure collections can all have metadata, as can symbols, I think. So you cannot use tools.reader or Clojure's built-in reader to get line/col info for values that do not support metadata.
(fn lp [s] (for [i s] (when <condition> (lp s))))
@leif a labeled anonymous function helps simplify this
@andy.fingerhut That's fine for my case. Thanks for the heads up. 🙂
@noisesmith AH, yes it would. Thanks.
@leif btw that racket version won't do the tail recursion recur would do, as it's not a tail call
@noisesmith You are correct.
one advantage of having recur
as special is it can error in non tail positions
But I wasn't looking for tail recursion in this case, as I want to do a depth first tree walk.
@leif in that case, see also clojure.walk
(walk/postwalk (fn [el] (if (condition? el) (f el) el)) tree)
> one advantage of having `recur` as special is it can error in non tail positions Oh, would you like that feature? (It would be pretty easy to do that in Racket.)
Or at least it used to be, I'd have to double check now that Matthew has changed the VM to one implemented in chez.
well, with racket the syntax for a tail call is the invocation itself - there's no syntactic element declaring "this should be stack optimized"
and it doesn't just optimize self calls
> @leif in that case, see also `clojure.walk` Nifty, thanks.
Not in the surface syntax yes, but there is in the zo
bytecode.
between tree-seq and clojure.walk{postwalk/prewalk} you can replicate those lispy idioms of tree traversal, but support clojure's larger variety of first class data types
Yup, I saw that in the docs...its super hepful, thanks. ^.^
Hello all, can someone let me know why :year does not have a value here? I am doing the same code in Java and it brings value
;; positional destructuring
(def date-regex #"(\d{2})\/(\d{2})\/(\d{4})")
(let [rem (re-matcher date-regex "12/02/1975 ")]
(when (.find rem)
(let [[g m d y h] rem]
{ :group g :month m :day d, :year y :nil_result h}
)
)
)
This is from the book the joy of Clojure and the result is the following {:group "12/02/1975", :month "12", :day "02", :year nil, :nil_result nil}
this isn't your error, but the \/
in your regex can be replaced with /
this issue is super weird:
(let [rem (re-matcher date-regex "12/02/1975 ")]
(when (.find rem)
;; using macroexpand-1
(let*
[vec__63125
rem
g
(clojure.core/nth vec__63125 0 nil)
m
(clojure.core/nth vec__63125 1 nil)
d
(clojure.core/nth vec__63125 2 nil)
y
(clojure.core/nth vec__63125 3 nil)
h
nil]
{:group g, :month m, :day d, :year y, :year2 (nth vec__63125 3) :nil_result h})))
> {:group "12/02/1975",
:month "12",
:day "02",
:year nil,
:year2 "1975",
:nil_result nil}
see the difference between (nth rem 3)
and (nth rem 3 nil)
the fact that you can destructure a re-matcher after calling .find, but you can't call seq on it, is very odd
btw:
user=> (re-find date-regex "12/02/1975 ")
["12/02/1975" "12" "02" "1975"]
it's less code and it does exactly what's expected
though you may have some extra reason to use a matcher instead?
thank you, I am just following an example of the book the joy of Clojure, chapter 3 the author gives this example and I was trying to understand it, comparing to a similar code in java
seems like a bug in (nth <matcher-instance> <index> <not-found>)
.
• from https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#groupCount() of .groupCount
: "Group zero denotes the entire pattern by convention. It is not included in this count."
• the check in https://github.com/clojure/clojure/blob/d07ef175c700329f7afbef8770332b6247a34a49/src/jvm/clojure/lang/RT.java#L968 in the nth
implementation checks to make sure <index>
is less than .groupCount
• the check should be for n <= m.groupCount()
yeah, off by one
I've never seen nth used on a Matcher before
I'm not following the details here, but if there's a bug, please log in ask