rewrite-clj

https://github.com/clj-commons/rewrite-clj
pez 2019-11-05T09:08:10.083400Z

Forgive me for stupid questions. I am quite unfamiliar with what is going on in rewrite-clj and only have the vaguest idea about zippers and stuff. Eventually I will fix that, but right now I am just a user by proxy and I wonder about how you guys think about parsing things that are not strictly EDN. My use cases are pretty printing, and syntax highlighting and such. Here’s an example:

(comment 
 (do
   (def db-uri "datomic:<mem://foo>")
   (d/create-database db-uri)
   (def conn (§d/connect db-uri))
   @(d/transact conn [{:db/doc "Hello world"}])))
Evaluating the (do... form here gives back something like this:
{:db-before datomic.db.Db@d612a1f8, :db-after datomic.db.Db@298c26ad, :tx-data [#datom[13194139534316 50 #inst "2019-11-05T09:00:57.870-00:00" 13194139534316 true] #datom[17592186045421 62 "Hello world" 13194139534316 true]], :tempids {-9223301668109598141 17592186045421}}
Trying to pretty-print that with anything other than clojure.core/pprint blows up, for somewhat different reasons, depending on which pretty printer is tried. But if rewrite.clj(s) is involved it blows up on the db values there, trying to parse the stuff after the @ as a number. On some level that makes sense, but from a Calva user perspective it most often doesn’t.

pez 2019-11-05T09:10:45.085400Z

So, my question. Is there a way today to get rewrite-clj to not totally croak on that input? And if it isn’t, do you think it would be something worth trying to add?

pez 2019-11-05T09:18:26.089700Z

Another thing that confuses me is that when trying to pretty print the result above, nREPL server side, all pretty printers I have tried, except clojure.core/pprint, start to evaluate the database values, so with the query above, zprint, for instance, prints out the whole database. Twice. This isn’t happening in rewrite-clj, right`?

borkdude 2019-11-05T09:49:46.090Z

@pez db values? can you give an example of the error?

pez 2019-11-05T10:01:00.091500Z

The db values are those keyed by :db-before and :db-after in the above result.

borkdude 2019-11-05T10:10:06.091800Z

This is what I'm seeing with rewrite-clj:

$ clj -Sdeps '{:deps {rewrite-clj {:mvn/version "RELEASE"}}}'
Clojure 1.10.1
user=&gt; (require '[rewrite-clj.parser :as p])
nil
user=&gt; (p/parse-string "{:db-before datomic.db.Db@d612a1f8, :db-after datomic.db.Db@298c26ad, :tx-data [#datom[13194139534316 50 #inst \"2019-11-05T09:00:57.870-00:00\" 13194139534316 true] #datom[17592186045421 62 \"Hello world\" 13194139534316 true]], :tempids {-9223301668109598141 17592186045421}}")
Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
Invalid number: 298c26ad.

borkdude 2019-11-05T10:10:45.092100Z

This is the same error you get from clojure.core/read-string

borkdude 2019-11-05T10:10:56.092500Z

so the problem is that this value doesn't round-trip as valid clojure code

pez 2019-11-05T10:11:50.092700Z

Exactly.

borkdude 2019-11-05T10:12:55.093200Z

using clojure.pprint on this doesn't require you to parse the value from a string, because it can print the value directly from memory. that's why it works

pez 2019-11-05T10:14:15.093800Z

Is rewrite-clj using read-string?

borkdude 2019-11-05T10:14:35.094200Z

rewrite-clj is designed to rewrite files with code/EDN in it

borkdude 2019-11-05T10:14:50.094600Z

so it parses the text, and then rewrites it and then writes it back as text

pez 2019-11-05T10:16:18.095900Z

Yeah. hence my question, if it would be something worth considering, to have a switch where it won’t croak when something doesn’t parse to a valid EDN type.

borkdude 2019-11-05T10:17:02.096200Z

if it isn't valid code, it should probably croak. it should not try to be better than clojure.core/read-string

borkdude 2019-11-05T10:17:58.097600Z

use the right tool for the right job, rewrite-clj isn't a pretty-printer, it's a rewriter from text -> text

pez 2019-11-05T10:18:48.098600Z

Agree somewhat. I’m just trying to avoid having to write my own parser. Which I have already done, as it happens, so I could probably solve this that wat, but I would like to be able to use libraries built on top of rewrite-clj.

borkdude 2019-11-05T10:19:51.099400Z

why are you first printing it to a string and then trying it to re-parse it, if the goal is to have pretty-printing?

pez 2019-11-05T10:19:59.099600Z

I know rewrite-clj isn’t a pretty printer, but it is used by pretty printers 😃

borkdude 2019-11-05T10:20:50.100700Z

it makes sense if you're trying to pretty-print text files (so code as you write it, re-formatting), but not for pretty-printing in memory values

pez 2019-11-05T10:21:53.101900Z

So, there are reasons. Trying to do this server side, I get other, unacceptable side effects (as mentioned above), so I am trying to solve it client side and then what I have is a string, because that is what nrepl gives me. Maybe it is time to look at the EDN transport there…

pez 2019-11-05T10:22:39.102600Z

Even if I wonder if it can be of help since it isn’t valid EDN to begin with…

borkdude 2019-11-05T10:23:06.103Z

rewrite-clj is concerned with code, not only EDN, but still same problem

pez 2019-11-05T10:23:33.103600Z

Yeah, Calva wants to use it for code as well.

borkdude 2019-11-05T10:23:43.103900Z

nREPL should give you access to the value as is, which you can then pprint, no serialization in between

borkdude 2019-11-05T10:23:54.104400Z

that's the only way to deal with this afaik

pez 2019-11-05T10:24:01.104600Z

Well, bencode is bencode.

borkdude 2019-11-05T10:24:15.104900Z

or try/catch and then fall back on something else

borkdude 2019-11-05T10:24:27.105200Z

print as is maybe

pez 2019-11-05T10:24:31.105300Z

That’s what I do now.

pez 2019-11-05T10:27:41.106500Z

But pretty printing is not the only use case for me. I also do code formatting, using cljfmt. And it creates a bad Ux when it refuses to format results like the above.

pez 2019-11-05T10:29:00.107600Z

In a way it sounds like cljfmt, zprint and other formatters/pretty printers should not be using rewrite-clj. Is that correctly understood?

borkdude 2019-11-05T10:29:21.108100Z

The above value cannot be parsed by Clojure itself so it should probably not occur in code. So I don't see a problem for cljfmt there?

borkdude 2019-11-05T10:30:09.109100Z

Not completely. Using rewrite-clj for pprinting is fine as long as you start with valid textual code

borkdude 2019-11-05T10:30:33.109700Z

Saved on disk. Serialized.

pez 2019-11-05T10:31:58.111400Z

Code files contain a lot of stuff. So what I often do is that I paste the result of some evaluation inside a (comment ...) form to have as a reference. For that usage pretty printing is very helpful.

borkdude 2019-11-05T10:32:54.112300Z

datomic.db.Db@298c26ad is something that Java prints when an Object doesn't have an overriden toString method. The one who serializes it could chose to serialize it as "&lt;&lt;object&gt;&gt;" so it will remain valid when parsed back. I think that's where it should/could be fixed.

borkdude 2019-11-05T10:33:22.112900Z

> Code files contain a lot of stuff. Yes, but if that code cannot be read by clojure.core/read-string you're f*cked anyway.

borkdude 2019-11-05T10:34:20.113500Z

Just try to insert the above db value in one of your project and then re-start your REPL.

borkdude 2019-11-05T10:34:56.113700Z

This is what I'm seeing for example:

borkdude 2019-11-05T10:35:10.113800Z

pez 2019-11-05T10:36:21.115100Z

Yeah, I’ve seen that often, trying to load a file with one of those (comment invalid-form) in it. 😃

pez 2019-11-05T10:37:55.116400Z

So, I know that read-string doesn’t like it. I have tried a lot of things when trying to fix this problem in Calva, believe me. haha

borkdude 2019-11-05T10:38:27.117Z

it's a common source of frustration that pr-str doesn't always output strings that can be read back with read-string. Same with edn/read-string.

borkdude 2019-11-05T10:38:47.117400Z

Maybe you should not try to fix that problem 😉

pez 2019-11-05T10:41:13.118600Z

Haha, well, I like for Calva to be as useful as I can make it. So I turn many stones.

borkdude 2019-11-05T10:41:13.118700Z

user=&gt; (read-string (pr-str {:a (symbol "foo bar")}))
Execution error at user/eval9 (REPL:1).
Map literal must contain an even number of forms

pez 2019-11-05T10:42:51.119100Z

However, rewrite-clj doesn’t complain. 😃

pez 2019-11-05T10:43:45.120Z

Would be horrible if it did, because then I couldn’t use cljfmt for my formatting at all.

pez 2019-11-05T10:44:51.120700Z

Anyway, I think I have the answer to my question. My search continues!

borkdude 2019-11-05T10:45:24.121100Z

The parser doesn't complain but:

user=&gt; (rewrite-clj.node/sexpr (p/parse-string "{:a foo bar}"))
Execution error (IllegalArgumentException) at rewrite-clj.node.seq/map-node$fn (seq.clj:110).
No value supplied for key: bar

pez 2019-11-05T10:45:48.121300Z

Yeah, that’s good!

plexus 2019-11-05T11:14:15.121900Z

@pez I would look into extending print-method so it prints datomic.db.Db values in a readable way

plexus 2019-11-05T11:15:19.122300Z

say #datomic/db {:ts 1234567}

plexus 2019-11-05T11:16:32.123500Z

actually reading that back to a Db value is a different story, but at least you'll be able to process it with rewrite-clj, and you can define a dummy data reader so this syntax round trips, even if reading it doesn't yield an actual valid Db instance

borkdude 2019-11-05T11:20:46.124Z

that works, but as an IDE you can't possibly foresee all the kinds of objects that people are going to print

borkdude 2019-11-05T11:38:10.124600Z

maybe something like this could work:

(set! *warn-on-reflection* true)
(let [old-method (get-method print-method Object)]
  (defmethod print-method Object [v ^java.io.Writer w]
    (if old-method (old-method v w)
        (.write w (format "&lt;&lt;%s&gt;&gt;" (.getName (.getClass ^Object v)))))))
(deftype XYZ [])
(XYZ.) ;;=&gt; &lt;&lt;user.XYZ&gt;&gt;
(remove-method print-method Object)

borkdude 2019-11-05T11:38:24.125100Z

not sure if/how that is going to interfere with existing print-methods...

borkdude 2019-11-05T11:38:33.125400Z

^ @plexus @pez

borkdude 2019-11-05T11:40:05.125800Z

maybe restoring the print-method after printing is best, so it doesn't interfere with other tooling

borkdude 2019-11-05T11:42:15.126100Z

so maybe:

(let [old-method (get-method print-method Object)]
  (defmethod print-method Object [v ^java.io.Writer w]
    (.write w (format "&lt;&lt;%s&gt;&gt;" (.getName (.getClass ^Object v)))))
  ;; pprint here
  ;; restore method
  (when old-method
    (defmethod print-method Object [v w]
      (old-method v w))))

borkdude 2019-11-05T11:49:46.126600Z

This is likely going to interfere with other threads that are running in the same JVM

borkdude 2019-11-05T12:59:17.127200Z

@pez Maybe you could also use print-dup. That throws when there is no valid method defined:

user=&gt; (let [sw (java.io.StringWriter.)] (print-dup (XYZ.) sw) (str sw))
Execution error (IllegalArgumentException) at user/eval146 (REPL:1).
No method in multimethod 'print-dup' for dispatch value: class <http://user.XYZ|user.XYZ>

pez 2019-11-05T15:27:33.128100Z

Thanks, both of you! I will experiment a bit and we’ll see if I can get this as user friendly as I’d like to.

2019-11-05T16:03:54.129900Z

@pez btw, regarding zippers, if you are interested, one of the most helpful videos i saw was plexus': https://lambdaisland.com/blog/2018-11-26-art-tree-shaping-clojure-zip -- i found the diagrams presented with the explanations to be particularly good.

lread 2019-11-05T16:09:18.131800Z

yes, I agree with @sogaiu, that is an excellent video. @pez, I also found https://clojure.org/reference/other_libraries#_zippers_functional_tree_editing_clojure_zip useful

lread 2019-11-05T16:10:47.133100Z

and also Tim Baldridge's videos are great (not entirely free, but also very inexpensive) https://tbaldridge.pivotshare.com/media/zippers-episode-1/11348/feature?t=0

2019-11-05T16:14:49.136Z

oh yes, i have found that following along with tim baldridge's presentations at the repl to be particularly instructive. though it is effort, i have usually found it to be worth it :thumbsup:

lread 2019-11-05T16:15:10.136500Z

yeah he's a smart guy and goes deep!

2019-11-05T16:15:36.136700Z

indeed!

2019-11-05T16:30:34.139400Z

on the note of zippers, not sure about how relevant it might be, but @martinklepsch noticed that in a java "port" of tree-sitter, there is a clojure api: https://github.com/JetBrains/jsitter/tree/master/clj -- it has some zipper stuff in it. apart from that though i wonder whether might be some ideas / synergy for rewrite-clj*...

borkdude 2019-11-05T16:48:26.140300Z

I also enjoyed plexus' zipper video (small sidenote: my findings about performance were different than his conclusion at the end)

2019-11-05T16:56:51.140900Z

thanks for sharing the bit about your findings, that is interesting indeed

plexus 2019-11-05T19:48:01.142Z

Can you elaborate @borkdude? I don't remember what I said exactly about performance, besides the recommendation to use fast-zip

plexus 2019-11-05T19:49:10.143400Z

Also yay glad to see people find that talk useful. The viz code is on github somewhere, although I never got around to releasing it properly.

borkdude 2019-11-05T21:13:38.144200Z

@plexus https://www.youtube.com/watch?v=5Nm56YvTKZY&amp;feature=youtu.be&amp;t=2558 "one of the benefits of using zippers is the performance gain", although you state that's not the main reason why you would use them. I didn't find them to be more performant than manual editing (actually the opposite).

borkdude 2019-11-05T21:17:21.145100Z

But overall quite nice introduction to zippers, thanks for that.

2019-11-05T21:21:51.145900Z

just so i understand, he was referring to the modifications being done all together vs each one separately, right?

borkdude 2019-11-05T21:23:19.147100Z

I think so, but traversing with zippers also causes data structures to be modified, so I guess it depends on the amount of overhead vs. the amount of updates you're trying to accomplish. With typical things I was doing with rewrite-clj manual updates were much faster. I guess we need benchmarks.

2019-11-05T21:25:29.147700Z

it's often in the details i guess 🙂