announcements

Project/library announcements ONLY - use threaded replies for discussions. Do not cross post here from other channels. Consider #events or #news-and-articles for other announcements.
alexmiller 2021-03-19T06:15:09.092300Z

well you're comparing doing work and not doing work, and usually doing work has some cost, so yes

alexmiller 2021-03-19T06:19:30.092500Z

although kwarg destructuring in general is significantly faster than it was due to the other changes in the patch

alexmiller 2021-03-19T06:25:04.093Z

given something like (defn foo [& {:as m}] (count m)) , running (time (dotimes [_ 500000] (foo :a 1 :b 2))) a few times shows best case around 70 ms on 1.10.3 and 14 ms on 1.11.0-alpha1. vs (defn foo-m [m] (count m)) and (time (dotimes [_ 500000] (foo-m {:a 1 :b 2}))) of around 7.5 ms.

alexmiller 2021-03-19T06:31:27.093200Z

there's a lot of "it depends" in those numbers, but might give you some idea

slipset 2021-03-19T07:56:55.104100Z

I’m very honoured to announce the release of https://github.com/clojure/data.json/releases/tag/data.json-2.0.0 . Thanks to @alexmiller for inspiration and guidance during this work! This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies. Using the benchmark data from jsonista we see the following improvement: Reading: • 10b from 1.4 µs to 609 ns (cheshire 995 ns) • 100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs) • 1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs) • 10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs) • 100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs) Writing • 10b from 2.3 µs to 590 ns (cheshire 1.0 µs) • 100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs) • 1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs) • 10k from 508 µs to 161 µs (cheshire 105.3 µs) • 100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)

👏 50
5
1
1
😍 10
borkdude 2021-03-19T08:10:47.105800Z

@slipset are you sure this is correct? > 10k from 508 µs to 161 µs (cheshire 105.3 ms)

borkdude 2021-03-19T08:11:05.106Z

161 µs vs 105.3 /ms/ ?

slipset 2021-03-19T08:11:43.106300Z

no

slipset 2021-03-19T08:12:47.106700Z

Fixed

ikitommi 2021-03-19T08:17:26.107Z

Great work! Will rerun the benchmarks on jsonista repo with the new version

👏 7
🙏 1
borkdude 2021-03-19T08:18:11.107600Z

FWIW there are also some new patches in cheshire master (with new Jackson) so it would be good to run against cheshire master and not the currently released version

slipset 2021-03-19T08:18:14.107800Z

jsonista is still faster though 🙂

borkdude 2021-03-19T08:18:24.108Z

e.g. writing became 15% faster

ikitommi 2021-03-19T08:27:01.108400Z

running with the latests now.

nilern 2021-03-19T08:48:51.109300Z

Most people will be using the latest jar though

2021-03-19T09:40:59.110600Z

this is great! 😍

2021-03-19T09:46:26.110900Z

@slipset fantastic! Care to elaborate on the tricks you used to speed it up?

slipset 2021-03-19T10:03:16.111500Z

1. remove the dynamic vars and pass them explicitly as an options map 2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder 3. for writing, don’t use format to construct unicode escapes The main trick though was to use the stuff in http://clojure-goes-fast.com ie, profile, observe the results, form a hypothesis, create a fix 🙂

❤️ 2
2021-03-19T10:11:07.111900Z

Yeah I’d taken a quick peak, and was mainly interested in hearing about 2 and 3. I entirely agree with the other advice too though! :thumbsup: :thumbsup: I’ve noticed in the past that unicode processing is often the slow bit in parsing large amounts of data. Also the performance difference between InputStream and Reader is staggering… mainly I believe because reader does that unicode stuff, and expands all characters into 16bits. So was curious how you were alleviating that. I’ve never tried parsing json, so know next to nothing about it; but I was trying to understand how you knew whether you needed to use unicode or not. I’m guessing you know you only need to handle unicode for strings inside the json?!, not the structure itself. Is that correct?!

roklenarcic 2021-03-19T10:27:01.112200Z

I was looking at the commit that replaced dynamic vars with options map. Couldn’t you have saved up even more time if internal functions like read-object received key-fn and value-fn as an argument instead of the whole options map and performing a map get?

nilern 2021-03-19T10:28:42.112600Z

data.json takes a Reader, I think @slipset just meant Unicode escapes inside strings

borkdude 2021-03-19T10:30:16.112800Z

avoiding apply and merge could possibly also help

☝️ 1
roklenarcic 2021-03-19T10:30:19.113Z

Another observation: couldn’t you capture the values of dynamic vars in a map at the start of the public functions like write-str and then you don’t get hit with dynamic var cost because you don’t access it repeatedly

2021-03-19T10:31:03.113200Z

@nilern: Yes I know. I was alluding to that too. I mention InputStream/Reader as something observed in my own work, and in support of the general point that handling unicode is slow.

ikitommi 2021-03-19T10:31:05.113400Z

the old:

jsonista.jmh/encode-data-json  :encode  :throughput  5         406998.934   ops/s  152242.102    {:size "10b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         146750.626   ops/s  13532.113     {:size "100b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         28543.913    ops/s  5982.429      {:size "1k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         1994.604     ops/s  193.798       {:size "10k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         229.534      ops/s  3.574         {:size "100k"}

ikitommi 2021-03-19T10:31:14.113600Z

the new:

jsonista.jmh/encode-data-json  :encode  :throughput  5         1534830.890  ops/s  155359.246    {:size "10b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         341613.782   ops/s  26261.051     {:size "100b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         69673.326    ops/s  1647.625      {:size "1k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         5658.247     ops/s  999.701       {:size "10k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         581.924      ops/s  39.758        {:size "100k"}

ikitommi 2021-03-19T10:31:36.113800Z

=> 2.5x throughtput improvement 🚀

ikitommi 2021-03-19T10:31:50.114100Z

jsonista:

jsonista.jmh/encode-jsonista   :encode  :throughput  5         6718559.441  ops/s  564494.417    {:size "10b"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         2021530.135  ops/s  227934.280    {:size "100b"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         358639.582   ops/s  33561.700     {:size "1k"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         32536.978    ops/s  8135.004      {:size "10k"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         2687.242     ops/s  185.516       {:size "100k"}

ikitommi 2021-03-19T10:32:15.114300Z

still much faster, but it’s 99% java.

nilern 2021-03-19T10:33:37.114500Z

Jackson (and simdjson) can do their own UTF-8 decoding while parsing from a byte stream. All the structural JSON characters are ASCII so yes Unicode is only really relevant inside strings.

👍 1
slipset 2021-03-19T10:34:53.114800Z

@nilern my initial patch had value-fn and key-fn passed as separate params, but that doesn’t really scale well (if you imagine passing more opts in the future). Also, the penalty from apply and array-map only shows on the smaller payloads, so it was probably worth the tradeoff.

slipset 2021-03-19T10:35:35.115100Z

https://clojure.atlassian.net/browse/DJSON-32

nilern 2021-03-19T10:35:58.115300Z

(I think you meant @roklenarcic)

slipset 2021-03-19T10:36:09.115500Z

I most certainly did. Sorry.

2021-03-19T10:39:36.115800Z

has some slack weirdness happened in this thread?! Some comments appear to have disappeared and replies now appear out of context e.g. my comment above was in response to something @nilern said which has also vanished.

nilern 2021-03-19T10:46:30.116Z

Nothing has vanished AFAICT. Try refreshing your browser?

2021-03-19T10:47:43.116200Z

:thumbsup: I’d done that, but doing it a second time seems to have fixed it.

slipset 2021-03-19T12:16:14.118Z

Unfortunately, there is a bug in the above release wrt to strings being longer than 64 chars, so do not use version 2.0.0, rather wait for 2.0.1 🥵

❤️ 18
2021-03-19T12:18:25.118300Z

ouch... solidarity and hugops

borkdude 2021-03-19T12:19:01.118600Z

The pure Clojure JSON ecosystem now rests on your shoulders slipset... take care!

🙏 3
ikitommi 2021-03-19T12:21:17.119200Z

new jmh-benchmarks on jsonista repo: https://github.com/metosin/jsonista#performance

👍 4
littleli 2021-03-19T12:26:57.120Z

stay strong! It's great effort. I personally would use pure library over java interop whenever possible.

☝️ 1
1
borkdude 2021-03-19T12:37:13.120400Z

Btw, I wondered if there is some JSON standard compliance test suit that these kinds of libs should be ran against

👀 1
borkdude 2021-03-19T12:37:21.120600Z

independent of their implementation

steffan 2021-03-19T12:46:03.122300Z

Perhaps https://github.com/nst/JSONTestSuite

☝️ 1
borkdude 2021-03-19T12:46:57.122600Z

excellent, I will post an issue at the cheshire side as well about this

NoahTheDuke 2021-03-19T13:23:33.124800Z

as linked in that repo, i would highly recommend reading this blog post about json parsing ambiguities: http://seriot.ch/parsing_json.php

😲 2
Ben Sless 2021-03-19T13:41:47.125600Z

very cool! Looks like some more % can be shaved off by using identical? and == instead of = where possible. Especially using identical?, as the documentation says "If value-fn returns itself" - can you assume it's the same object?

alexmiller 2021-03-19T14:36:12.128400Z

if anyone is interested in working on things like this, please join the club! would be happy to have help on this

alexmiller 2021-03-19T14:38:18.128800Z

data.json 2.0.1 is now available • Fix https://clojure.atlassian.net/browse/DJSON-37: Fix off-by-one error reading long strings, regression in 2.0.0

👍 4
❤️ 17
alexmiller 2021-03-19T14:39:39.128900Z

an excellent way to work on problems is to write them down in a trackable place like https://ask.clojure.org or jira (if you have access)

👍 1
alexmiller 2021-03-19T14:40:25.129300Z

Changelog for 2.0.0 fyi: • Perf https://clojure.atlassian.net/browse/DJSON-35: Replace PrintWriter with more generic Appendable, reduce wrapping • Perf https://clojure.atlassian.net/browse/DJSON-34: More efficient writing for common path • Perf https://clojure.atlassian.net/browse/DJSON-32: Use option map instead of dynamic variables (affects read+write) • Perf https://clojure.atlassian.net/browse/DJSON-33: Improve speed of reading JSON strings • Fix https://clojure.atlassian.net/browse/DJSON-30: Fix bad test

borkdude 2021-03-19T14:49:59.129700Z

@slipset Congrats on the fix 😅 Does this affect the benchmarks?

alexmiller 2021-03-19T14:51:26.129900Z

doubt it

Ben Sless 2021-03-19T14:55:27.130300Z

you're right. I still need to open a jira on update-in's performance, too

chrisn 2021-03-19T18:49:28.134400Z

I tested dtype-next's ffi generation with graal native and avclj. After a bit of work (a couple days) I can now generate a graal native executable that encodes video 🙂. https://github.com/cnuernber/avclj

👍 9
🤯 18