rewrite-clj

https://github.com/clj-commons/rewrite-clj
reefersleep 2021-03-15T11:06:20.164700Z

So, I’m using the zip API to navigate through a file. I need to find a particular function call and edit it. I’ve found out how to do this, and I’m composing it into a context where I recursively do this to all files under a directory. Yay! However, there are problems. I’m using find-next to find the particular s-expression that I’m interested in, and in my predicate function, I’m using z/sexpr to be able to inspect the s-expression as a Clojure data structure. Some of the files under the directory has discard blocks at the root level, and this causes z/sexpr to throw. I could (and I’m probably going to) iterate through the nodes using z/right instead and interpret the type of node, or just catch, to figure out whether I’m at a discard node (or some other un-sexpr-able node) that I need to skip, and skip it. I’m curious that z/find-next doesn’t already have this option, though. I see that z/right, for example, internally has a call to skip-whitespace. Why not also have skip-unsexprable as an option somewhere in the call chain?

borkdude 2021-03-15T11:07:27.165200Z

I think z/sexpr should not throw when a node has unevals/discards

reefersleep 2021-03-15T11:08:03.166100Z

I see the rationale, though (I think! 🙂 ); what would you return in place of a discard node?

borkdude 2021-03-15T11:08:12.166500Z

it should be treated the same as whitespace: ignored

reefersleep 2021-03-15T11:08:17.166700Z

nil is misleading. It’s literally discarded, not returned as nil

reefersleep 2021-03-15T11:08:40.166900Z

hm, but what would you return?

reefersleep 2021-03-15T11:08:43.167200Z

oh

reefersleep 2021-03-15T11:08:49.167600Z

don’t visit them at all

reefersleep 2021-03-15T11:08:52.167800Z

actually skip them 🙂

borkdude 2021-03-15T11:09:01.168100Z

oh you are calling z/sexpr on an uneval node itself? not something with an uneval node inside of it?

reefersleep 2021-03-15T11:09:08.168300Z

exactly

reefersleep 2021-03-15T11:09:12.168600Z

the latter works fine

borkdude 2021-03-15T11:09:12.168700Z

yeah, don't do that :)

reefersleep 2021-03-15T11:09:15.168900Z

haha

reefersleep 2021-03-15T11:09:27.169100Z

I’m trying to work my way around it 🙂

borkdude 2021-03-15T11:09:38.169300Z

you can just look at the :tag of the node

borkdude 2021-03-15T11:09:41.169600Z

and then skip it

reefersleep 2021-03-15T11:09:51.169800Z

yeh, that’s what I figured. Interpret the type of node.

reefersleep 2021-03-15T11:10:08.170300Z

It forces me into oldschool looping rather than the neat find-next functionality 🙂

reefersleep 2021-03-15T11:10:39.171300Z

And I was wondering if that could maybe be an area that could improved. I don’t know the history of rewrite-clj, though

borkdude 2021-03-15T11:10:43.171500Z

you can also do a kind of postwalk using z/next maybe

borkdude 2021-03-15T11:11:48.171800Z

not sure if that helps over find-next, I never used the latter

borkdude 2021-03-15T11:12:07.172100Z

you should always be a bit careful with z/sexpr

borkdude 2021-03-15T11:12:12.172300Z

this has always been the case

borkdude 2021-03-15T11:12:38.172900Z

it can throw on invalid formed maps as well, e.g. {:a 1 :b}, there's just nothing rewrite-clj can or should do about this I think

reefersleep 2021-03-15T11:13:00.173200Z

I get that.

reefersleep 2021-03-15T11:13:35.173900Z

So you might actually generally have to have the try/catch safeguard

reefersleep 2021-03-15T11:14:13.174400Z

Depending on your scenario ofc

borkdude 2021-03-15T11:14:40.174900Z

yes, and/or first check with tag

lread 2021-03-15T12:08:30.176Z

@reefersleep there is an open issue on skipping uneval nodes https://github.com/clj-commons/rewrite-clj/issues/70, we are discussing a decent way to solve.

lread 2021-03-15T12:12:13.177700Z

I sometimes use zip API walk functions when I want to only visit certain nodes in a zipper.

reefersleep 2021-03-15T12:15:15.178300Z

@lee I’ll have a look! Meanwhile, I made my own find-next.

(defn sturdy-find-next
  ([zloc pred]
   (sturdy-find-next zloc z/right pred))
  ([zloc f pred]
   (if (z/end? zloc)
     nil
     (let [next-zloc (f zloc)]
       (cond (z/end? next-zloc)
             nil

             (= :uneval (n/tag (z/node next-zloc)))
             (recur next-zloc f pred)

             (pred next-zloc)
             next-zloc

             :else
             (recur next-zloc f pred))))))

reefersleep 2021-03-15T12:15:32.178500Z

(Just a quick WIP)

👍 1
reefersleep 2021-03-15T12:17:03.179400Z

I think the idea of further configurability of what is skipped is interesting, but of course needs some hammock time 🙂

reefersleep 2021-03-15T12:17:14.179700Z

(what @sogaiu mentions in the thread)

borkdude 2021-03-15T12:19:51.180400Z

@lee Without looking into the issue, so forgive my ignorance, why is uneval not just treated as whitespace? Because one might want to rewrite it right?

reefersleep 2021-03-16T09:01:41.005300Z

That’s what I thought, and I

reefersleep 2021-03-16T09:01:43.005500Z

m

reefersleep 2021-03-16T09:02:50.005700Z

I think I’m in the same state of mind as you; maybe it should be up to the user what they want to skip, and what not. In an ideal world, anyway. But I’ve no idea if that’d pollute the interface, how it’d impact performance and so on.

lread 2021-03-15T12:23:29.182800Z

I’m guessing the original authors just made a choice on what to skip. I see their point, it really is not technically whitespace.

borkdude 2021-03-15T12:23:48.183100Z

But same could be true for commas?

borkdude 2021-03-15T12:24:04.183600Z

What if I want to rewrite all commas into whitespace for example?

lread 2021-03-15T12:24:23.184100Z

But commas are technically Clojure whitespace.

borkdude 2021-03-15T12:24:46.184500Z

yes, I know, but you might want to get rid of those using a zipper in rewrite-clj

lread 2021-03-15T12:27:36.187800Z

It is a fuzzy area, I don’t think the default choice was wrong, and am thinking something configurable probably makes sense. The fact that rewrite-clj skips comments by default is likely why cljfmt does not format them!

borkdude 2021-03-15T12:28:05.188Z

The difference from a Clojure perspective between ;; and # is that # actually goes through the reader. Anything invalid in # will cause the reader to throw. E.g. #{:a}. So technically it doesn't belong in the "whitespace" category of things that are plainly ignored by the reader.

1
lread 2021-03-15T12:28:50.188500Z

Good observation, yeah

lread 2021-03-15T12:36:43.191100Z

To finish my ramble on cljfmt... it does also use raw movement functions (effectively right* next* etc) to examine and effect changes to whitespace.

lread 2021-03-15T13:39:57.198200Z

On a different note, I’m thinking I should probably start measuring how changes we make to rewrite-clj affect its performance. I know this can be a difficult thing to do in a meaningful way, so might not pursue. I’m particularly interested how rewrite-clj v1 compares to rewrite-cljs as I turfed some of the rewrite-cljs tunings in favour of a common code base. But I’m also generally interested. Not sure what a good strategy would be yet. Do folks know of any projects that do measure and compare performance? I do run 3rd party libs tests suits (ex. zprint, cljfmt, etc) against rewrite-clj v1. I wonder if somehow I could use the test suite runs to help me measure if rewrite-clj changes improve/degrade performance.

🙌 1
lread 2021-03-15T23:14:11.203400Z

I am planning to upgrade cljfmt to rewrite-clj v1 (if weavejester agrees). Since cljfmt uses rewrite-cljs, I decided to take a https://github.com/weavejester/cljfmt/issues/211#issuecomment-799813626. Result are to be taken with a grain of salt, but under my little test I am seeing rewrite-clj v1 is 20% slower than rewrite-cljs for cljfmt. I guessed there might be a hit and this seems not too horrible. We’ll see what others think.

👍 1
🙌 1
reefersleep 2021-03-16T09:01:01.004800Z

Tufte is great for manually working down towards bottlenecks. Auto-profiling sounds even better!

reefersleep 2021-03-16T09:36:55.005900Z

And, dipping into performance tuning for the easy wins feels awesome. You could timebox it and prevent wasting too much time :)

🙂 1
vemv 2021-03-16T10:12:20.006300Z

Yeah perf can be important. In a use case like clj-kondo, integrated with some sort of "squiggly lines" IDE functionality one wants a particularly tight feedback loop ...while for cljfmt it's less important IMO, as one is not reformatting files the whole time (especially for most IDEs, which basic formattings will match with cljfmt's)

lread 2021-03-16T10:50:43.008800Z

Good news, weavejester does not consider potential cljs slowdowns with rewrite-clj v1 as blockers, so I’ll proceed without optimizations for now.

👍 3
vemv 2021-03-15T23:47:54.204200Z

btw have you profiled execution looking for slow spots? btw I have a WIP lib that auto-profiles a whole codebase with https://github.com/ptaoussanis/tufte/ , so one can know the slow functions very quickly I could share a sample report if that helps