clojure-europe

For people in Europe... or elsewhere... UGT https://indieweb.org/Universal_Greeting_Time
ordnungswidrig 2021-03-17T07:03:04.000200Z

Good morning!

ordnungswidrig 2021-03-17T07:03:27.001Z

@simongray wouldn't a zipper require the data all be in memory?

pez 2021-03-17T07:05:08.003900Z

Good morning! Did you see I made a macro? 😃

👍 1
dharrigan 2021-03-17T07:05:53.004600Z

macrotastic

❤️ 1
djm 2021-03-17T07:07:03.006200Z

👋

mccraigmccraig 2021-03-17T07:08:33.008200Z

månmån

simongray 2021-03-17T07:08:58.009Z

@ordnungswidrig I don't think zippers as a concept would necessitate keeping the entire data structure in memory. They are basically pointers inside a tree data structure with some options for local navigation. If you make a custom tree data structure that doesn't realise the contents of its branches until read time, then it won't keep everything in memory.

simongray 2021-03-17T07:09:50.010Z

And here I'm talking about clojure.zip - I'm sure you can make a zipper that's more efficient than that implementation too.

simongray 2021-03-17T07:14:09.013600Z

But obviously it all depends on the data structure. Not sure how lazy something like Clojure's hashmap would be in practice.

simongray 2021-03-17T07:16:34.015900Z

If you're navigating a huge json file from a disk you will need some custom data structure magic.

simongray 2021-03-17T07:19:45.019800Z

But the search algorithm itself would be easy to write: just check sibling nodes and follow paths down where you need to. This can be parallelised too for searches, but I don't think it would work for transformations - at least I wouldn't be sure how to join the edits from a bunch of zippers.

ordnungswidrig 2021-03-17T07:27:52.020800Z

"Lazy stream zippers" sounds like a 60s progressive rock band ❤️

🤘 3
thomas 2021-03-17T08:14:36.022100Z

moin moin

orestis 2021-03-17T08:27:31.022400Z

Morning

slipset 2021-03-17T08:33:52.024400Z

@simongray Don’t know too much about zippers, but some/most underlying json parsers eagerly parse the json into tokens. So while you could avoid creating all the maps/seqs, you still need to fit the whole json string/tokens in memory. What would be really cool would be a streaming/lazy tokenizer (which I believe is what @ordnungswidrig found yesterday)

synthomat 2021-03-17T08:34:46.024800Z

👋 ohai

slipset 2021-03-17T08:34:52.025100Z

ohai!

slipset 2021-03-17T08:36:16.027400Z

So imagine I had a 300GB json and I basically a wanted (get-in parsed-jason [0 :name]) , the trick is to get the name of the first object without parsing the whole json string.

simongray 2021-03-17T08:36:43.028100Z

sure

simongray 2021-03-17T08:37:02.028900Z

that’s then data structure magic I was referring too 😉

slipset 2021-03-17T08:37:14.029300Z

With that, comes the question of validating the json as well. You cannot validate it without reading the whole thing.

simongray 2021-03-17T08:37:25.029500Z

sure

slipset 2021-03-17T08:37:34.029800Z

(which may or may not be a problem)

simongray 2021-03-17T08:41:14.033100Z

but I guess you can interpret every level of the data structure you need as as {key_1 <pointer to val_1>, … key_n <pointer to val_n>}. Then you need a zipper function that can realise a pointer as a piece of data. I guess the pointer could just be the linecount/charcount boundary of each val.

simongray 2021-03-17T08:42:21.034300Z

The point is just, with the right data structure, implementing a zipper for it is pretty straightforward and zippers can be paused, resumed, rewinded, and really made to go in any direction.

simongray 2021-03-17T08:44:56.036700Z

So basically you read through the entire contents of a json object (as text), registering the keys and the boundaries of the vals (your pointers). Then you can zip into any one of those vals in separate threads if you like and simply repeat the algorithm for the boundary contained by the pointer, e.g. “line 4/char 7 to line 6/char 12”.

simongray 2021-03-17T08:46:40.038500Z

I realise that I may just be cargoculting zippers since I really like using them 😛

simongray 2021-03-17T08:47:32.039900Z

to me the fact that you write pausable tree navigation and transformation algorithms using such a simple tool is quite magic

slipset 2021-03-17T08:47:41.040200Z

Yes, I think I see that, but I was more thinking about the problem of an infinite json-stream (A json-based Turing machine?) or a json-stream which was to big to hold in memory (or which stream was so slow that it didn’t make sense to read the whole thing to get the first element , (https://tools.ietf.org/html/rfc2549)))

simongray 2021-03-17T08:48:45.040400Z

i see

thomas 2021-03-17T09:00:46.046400Z

just thinking out loud here.... could shelling out to jq help here? it might do something clever (I don't know, but I guess easy to test)

dharrigan 2021-03-17T09:02:12.047500Z

jq does allow the --stream option

dharrigan 2021-03-17T09:02:24.047900Z

<https://stedolan.github.io/jq/manual/#Streaming>

ordnungswidrig 2021-03-17T09:02:50.048600Z

This is why I think regex state machines are how this might be implemented. E.g. query like `give me the “order” object when any of the “orderitem/product/name” values contains “gizmo” could be “compiled” into a statemachine which collects order date until you can rule out the the pattern matches. This all happens on an event stream of json tokens: :map :key "orders" :list :map :key "id" :string "123" :key "orderitems :list :map :key "name" :string "Rumbling gizmo" :endmap :endlist :endmap :endlist :endmap

dharrigan 2021-03-17T09:02:50.048700Z

never used it myself 'tho

ordnungswidrig 2021-03-17T09:03:01.049Z

@dharrigan that sounds like an interesting option

ordnungswidrig 2021-03-17T09:03:05.049200Z

but only json, not edn 😛

dharrigan 2021-03-17T09:03:13.049500Z

🙂

borkdude 2021-03-17T09:03:17.049600Z

PR to jet welcome for EDN :P

borkdude 2021-03-17T09:04:17.050100Z

Another idea: don't store your data in JSON but XML and use tools that already worked in the 00s ;)

1
jasonbell 2021-03-17T09:08:32.050300Z

Morning

simongray 2021-03-17T09:13:47.052400Z

@slipset BTW I starred this repo the other day: https://github.com/pangloss/fermor Haven’t looked that deep into it, but looks to me like it’s using some of the same buzzwords.

simongray 2021-03-17T09:17:40.052800Z

seems to use a Java dependency, though-

slipset 2021-03-17T09:21:27.053100Z

Thanks!

agigao 2021-03-17T10:41:58.053300Z

Morning!

ordnungswidrig 2021-03-17T10:44:03.053700Z

@simongray I like this code comment from the examples:

;; This version is a very direct port of the above query and in a fermor system
;; would never pass code review. It has all of the guts of the query hanging
;; out. Instead we can trivially create a domain model and work at a much higher
;; level of abstraction.

ordnungswidrig 2021-03-17T10:54:45.054400Z

hmmm one could also implement a jackson parser for EDN. 🙂 This should unlock JsonPath on EDN I guess. :thinking_face: Not sure if that’s a nifty idea though.

borkdude 2021-03-17T11:06:02.054800Z

Maybe one can already use JsonPath on transit? :thinking_face:

raymcdermott 2021-03-17T13:23:08.055100Z

morning

ordnungswidrig 2021-03-17T15:49:11.056900Z

@borkdude you mean translating jsonpath to a jsonpath expression that would match on transit?

ordnungswidrig 2021-03-17T15:49:22.057200Z

Sounds super hard because of the statefulness of transit 🙂

borkdude 2021-03-17T15:52:31.057400Z

yes, hard

pez 2021-03-17T21:09:10.059500Z

Playing with writing a macro is quite amazing. I am actually manipulating Clojure code like the data it is. I’ve been telling people about that Clojure is homiconic and sort of understood what it means, but now I realize what it actually means. 😃

4
😄 1