well you're comparing doing work and not doing work, and usually doing work has some cost, so yes
although kwarg destructuring in general is significantly faster than it was due to the other changes in the patch
given something like (defn foo [& {:as m}] (count m))
, running (time (dotimes [_ 500000] (foo :a 1 :b 2)))
a few times shows best case around 70 ms on 1.10.3 and 14 ms on 1.11.0-alpha1. vs (defn foo-m [m] (count m))
and (time (dotimes [_ 500000] (foo-m {:a 1 :b 2})))
of around 7.5 ms.
there's a lot of "it depends" in those numbers, but might give you some idea
I’m very honoured to announce the release of https://github.com/clojure/data.json/releases/tag/data.json-2.0.0 . Thanks to @alexmiller for inspiration and guidance during this work! This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies. Using the benchmark data from jsonista we see the following improvement: Reading: • 10b from 1.4 µs to 609 ns (cheshire 995 ns) • 100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs) • 1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs) • 10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs) • 100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs) Writing • 10b from 2.3 µs to 590 ns (cheshire 1.0 µs) • 100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs) • 1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs) • 10k from 508 µs to 161 µs (cheshire 105.3 µs) • 100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)
@slipset are you sure this is correct? > 10k from 508 µs to 161 µs (cheshire 105.3 ms)
161 µs vs 105.3 /ms/ ?
no
Fixed
Great work! Will rerun the benchmarks on jsonista repo with the new version
FWIW there are also some new patches in cheshire master (with new Jackson) so it would be good to run against cheshire master and not the currently released version
jsonista is still faster though 🙂
e.g. writing became 15% faster
running with the latests now.
Most people will be using the latest jar though
this is great! 😍
@slipset fantastic! Care to elaborate on the tricks you used to speed it up?
1. remove the dynamic vars and pass them explicitly as an options map
2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder
3. for writing, don’t use format
to construct unicode escapes
The main trick though was to use the stuff in http://clojure-goes-fast.com
ie, profile, observe the results, form a hypothesis, create a fix 🙂
Yeah I’d taken a quick peak, and was mainly interested in hearing about 2 and 3. I entirely agree with the other advice too though! :thumbsup: :thumbsup:
I’ve noticed in the past that unicode processing is often the slow bit in parsing large amounts of data. Also the performance difference between InputStream
and Reader
is staggering… mainly I believe because reader does that unicode stuff, and expands all characters into 16bits. So was curious how you were alleviating that.
I’ve never tried parsing json, so know next to nothing about it; but I was trying to understand how you knew whether you needed to use unicode or not. I’m guessing you know you only need to handle unicode for strings inside the json?!, not the structure itself. Is that correct?!
I was looking at the commit that replaced dynamic vars with options map. Couldn’t you have saved up even more time if internal functions like read-object
received key-fn
and value-fn
as an argument instead of the whole options map and performing a map get?
data.json
takes a Reader
, I think @slipset just meant Unicode escapes inside strings
avoiding apply
and merge
could possibly also help
Another observation: couldn’t you capture the values of dynamic vars in a map at the start of the public functions like write-str
and then you don’t get hit with dynamic var cost because you don’t access it repeatedly
@nilern: Yes I know. I was alluding to that too. I mention InputStream/Reader as something observed in my own work, and in support of the general point that handling unicode is slow.
the old:
jsonista.jmh/encode-data-json :encode :throughput 5 406998.934 ops/s 152242.102 {:size "10b"}
jsonista.jmh/encode-data-json :encode :throughput 5 146750.626 ops/s 13532.113 {:size "100b"}
jsonista.jmh/encode-data-json :encode :throughput 5 28543.913 ops/s 5982.429 {:size "1k"}
jsonista.jmh/encode-data-json :encode :throughput 5 1994.604 ops/s 193.798 {:size "10k"}
jsonista.jmh/encode-data-json :encode :throughput 5 229.534 ops/s 3.574 {:size "100k"}
the new:
jsonista.jmh/encode-data-json :encode :throughput 5 1534830.890 ops/s 155359.246 {:size "10b"}
jsonista.jmh/encode-data-json :encode :throughput 5 341613.782 ops/s 26261.051 {:size "100b"}
jsonista.jmh/encode-data-json :encode :throughput 5 69673.326 ops/s 1647.625 {:size "1k"}
jsonista.jmh/encode-data-json :encode :throughput 5 5658.247 ops/s 999.701 {:size "10k"}
jsonista.jmh/encode-data-json :encode :throughput 5 581.924 ops/s 39.758 {:size "100k"}
=> 2.5x throughtput improvement 🚀
jsonista:
jsonista.jmh/encode-jsonista :encode :throughput 5 6718559.441 ops/s 564494.417 {:size "10b"}
jsonista.jmh/encode-jsonista :encode :throughput 5 2021530.135 ops/s 227934.280 {:size "100b"}
jsonista.jmh/encode-jsonista :encode :throughput 5 358639.582 ops/s 33561.700 {:size "1k"}
jsonista.jmh/encode-jsonista :encode :throughput 5 32536.978 ops/s 8135.004 {:size "10k"}
jsonista.jmh/encode-jsonista :encode :throughput 5 2687.242 ops/s 185.516 {:size "100k"}
still much faster, but it’s 99% java.
Jackson (and simdjson) can do their own UTF-8 decoding while parsing from a byte stream. All the structural JSON characters are ASCII so yes Unicode is only really relevant inside strings.
@nilern my initial patch had value-fn and key-fn passed as separate params, but that doesn’t really scale well (if you imagine passing more opts in the future). Also, the penalty from apply and array-map only shows on the smaller payloads, so it was probably worth the tradeoff.
(I think you meant @roklenarcic)
I most certainly did. Sorry.
has some slack weirdness happened in this thread?! Some comments appear to have disappeared and replies now appear out of context e.g. my comment above was in response to something @nilern said which has also vanished.
Nothing has vanished AFAICT. Try refreshing your browser?
:thumbsup: I’d done that, but doing it a second time seems to have fixed it.
Unfortunately, there is a bug in the above release wrt to strings being longer than 64 chars, so do not use version 2.0.0, rather wait for 2.0.1 🥵
ouch... solidarity and hugops
The pure Clojure JSON ecosystem now rests on your shoulders slipset... take care!
new jmh-benchmarks on jsonista repo: https://github.com/metosin/jsonista#performance
stay strong! It's great effort. I personally would use pure library over java interop whenever possible.
Btw, I wondered if there is some JSON standard compliance test suit that these kinds of libs should be ran against
independent of their implementation
excellent, I will post an issue at the cheshire side as well about this
as linked in that repo, i would highly recommend reading this blog post about json parsing ambiguities: http://seriot.ch/parsing_json.php
very cool!
Looks like some more % can be shaved off by using identical?
and ==
instead of =
where possible. Especially using identical?
, as the documentation says "If value-fn returns itself" - can you assume it's the same object?
if anyone is interested in working on things like this, please join the club! would be happy to have help on this
data.json 2.0.1 is now available • Fix https://clojure.atlassian.net/browse/DJSON-37: Fix off-by-one error reading long strings, regression in 2.0.0
an excellent way to work on problems is to write them down in a trackable place like https://ask.clojure.org or jira (if you have access)
Changelog for 2.0.0 fyi: • Perf https://clojure.atlassian.net/browse/DJSON-35: Replace PrintWriter with more generic Appendable, reduce wrapping • Perf https://clojure.atlassian.net/browse/DJSON-34: More efficient writing for common path • Perf https://clojure.atlassian.net/browse/DJSON-32: Use option map instead of dynamic variables (affects read+write) • Perf https://clojure.atlassian.net/browse/DJSON-33: Improve speed of reading JSON strings • Fix https://clojure.atlassian.net/browse/DJSON-30: Fix bad test
@slipset Congrats on the fix 😅 Does this affect the benchmarks?
doubt it
you're right. I still need to open a jira on update-in's performance, too
I tested dtype-next's ffi generation with graal native and avclj. After a bit of work (a couple days) I can now generate a graal native executable that encodes video 🙂. https://github.com/cnuernber/avclj