specter

Latest version: 1.1.3
sophiago 2018-05-18T22:38:22.000311Z

Question: I have a simple transform expression with a recursive navigator very similar to the one in the example that I'm using in a macro to navigate to every symbol in an AST and replace parts of certain ones (unfortunately converting to and from strings and using regex since I only need to replace part of these symbols). I'd also like it to jump one level up in the AST from each match, or more precisely group of matches with the same substring, and wrap it in a list at that point. Can anyone recommend a starting point for thinking about defining a relative path with specter like this? I would ideally like to avoid the perf overhead of two separate transforms.

sophiago 2018-05-18T22:49:24.000311Z

Oh, and this also ignores the issue of wanting to group together multiple occurrences of the same substring. So the symbol added in the second transform should only ever be one, but the predicate would have to be modified to match on the collection above all other collections that contain that exact substring.

nathanmarz 2018-05-18T22:53:45.000256Z

@sophiago that first one can be simplified to (setval [TREE NAME #"p[1-9]"] "p" (macroexpand x))

👍 1
sophiago 2018-05-18T22:54:14.000075Z

@sophiago uploaded a file: https://clojurians.slack.com/files/U2TCUSM2R/FASH2M1QB/-.clj

sophiago 2018-05-18T22:54:46.000021Z

@nathanmarz ah, thanks. I wrote this very late last night and was thinking I should have been able to do something like that

sophiago 2018-05-18T22:55:24.000156Z

I just posted an example of what that transform does with what I hope is a slightly better description of what I'd like to add to it

sophiago 2018-05-18T22:56:37.000225Z

(I also omitted mentioning I'm using a fork of the compiler where the result would actually be valid...i.e. nested function literals where % shadow one another if necessary)

nathanmarz 2018-05-18T22:57:16.000063Z

not understanding what you're trying to do with that second transform

sophiago 2018-05-18T22:57:44.000144Z

Let me provide a more succinct and precise example

sophiago 2018-05-18T22:58:46.000217Z

There's one place in the macroexpansion above where the bound variable occurs twice: (*'(long (. Math pow 10 (first p__13958#))) (second p__13958#))

sophiago 2018-05-18T22:59:43.000016Z

So, ideally in the same transform, I'd like that to become: (fn* [p__13958#] (*'(long (. Math pow 10 (first p__13958#))) (second p__13958#))

sophiago 2018-05-18T23:00:41.000221Z

The cases where the bound variables only appear once are simpler, although I'm still unsure how to handle them in one pass with Specter

nathanmarz 2018-05-18T23:01:06.000116Z

so you want to collect every unique symbol starting with "p" and wrap the expression with (fn* [<collected p's>] ...)?

sophiago 2018-05-18T23:02:56.000295Z

Yes, but only wrap the deepest subexpression where the symbols are exactly the same. But the numerals will never repeat in the input, so saying exact substrings is good enough.

nathanmarz 2018-05-18T23:03:50.000049Z

for wrapping once you identify the subexpression you can do it like:

(transform (collect TREE symbol? (selected? NAME #"p[1-9]"))
  (fn [psyms expr]
    `(fn* [~@(set psyms)] ~expr)
    )
  data)

nathanmarz 2018-05-18T23:04:05.000212Z

assuming TREE goes to every leaf

sophiago 2018-05-18T23:04:22.000156Z

TREE goes to every coll currently

nathanmarz 2018-05-18T23:04:35.000038Z

as for "deepest subexpression where the symbols are exactly the same" that sounds like a more involved algorithm

nathanmarz 2018-05-18T23:04:55.000255Z

the same "p" symbol could exist deeply nested across multiple branches

sophiago 2018-05-18T23:05:02.000004Z

Right, that's much more involved than simple one list up.

sophiago 2018-05-18T23:07:01.000128Z

I can make some assumptions based on what would constitute valid input. For example, how px_foo means foo will always be the same for every x. Similarly, for each x: they'll be grouped together as far as depth in the entire AST (not sure whether that makes sense).

nathanmarz 2018-05-18T23:08:25.000067Z

not really

sophiago 2018-05-18T23:08:27.000183Z

So both those in the example I pulled out started with p3. There would never be a p3 at a higher level of nesting or below the level where there's a p4. I'm struggling for language to describe that. Like the "x" in "px" is monotonic with AST depth?

nathanmarz 2018-05-18T23:09:36.000004Z

you mean the depth of the root of the subexpression containing all instances of pn is monotonic with n?

sophiago 2018-05-18T23:09:37.000135Z

So when you mentioned multiple branches, I can actually ignore that. It's only a matter of variance in the number of lists you'd need to jump up for the symbols to be in the same one.

sophiago 2018-05-18T23:10:27.000225Z

"depth of the root of the subexpression containing all instances of pn is monotonic with n" => I believe this is exactly what I'm stating

sophiago 2018-05-18T23:10:57.000150Z

Just some confusion over your use of the term "root"

nathanmarz 2018-05-18T23:12:06.000102Z

the difference between:

[:a [1 [2]] [1 [2]]]

[:a [1 1] [[2] [2]]]

nathanmarz 2018-05-18T23:12:24.000002Z

in the first, the root for both is [:a [1 [2]] [1 [2]]]

nathanmarz 2018-05-18T23:12:43.000239Z

in the second, the root for "1" is [1 1] and for "2" [[2] [2]]

nathanmarz 2018-05-18T23:12:49.000125Z

is that what you're getting at?

sophiago 2018-05-18T23:13:05.000041Z

Yes. You can assume it's like your second example.

sophiago 2018-05-18T23:14:16.000252Z

To backtrack a bit (no pun intended) in (setval [TREE NAME #"p[1-9]"] "p" (macroexpand x)) will "p" be made into a symbol or do I need to add that?

sophiago 2018-05-18T23:14:41.000169Z

I hate treating symbols as strings, but seems unavoidable here 😕

sophiago 2018-05-18T23:16:05.000138Z

To be clear, the transformation at that level is p3__13962# => p__13962#

nathanmarz 2018-05-18T23:21:56.000042Z

it replaces the substring within the symbol

nathanmarz 2018-05-18T23:22:12.000136Z

behind the scenes NAME is extracting the string for the symbol's name, manipulating it, and then reconstructing the symbol

sophiago 2018-05-18T23:22:58.000214Z

Oh, that's awesome. Such a common pain point.

nathanmarz 2018-05-18T23:23:50.000140Z

as for the subexpression identification part, that seems more involved

sophiago 2018-05-18T23:23:56.000230Z

Also, in your second example, shouldn't [~@(set psyms)] be either ~@(into [] (set psyms)) or ~@(into [] (distinct psyms))?

sophiago 2018-05-18T23:24:30.000133Z

into or vec, I'd have to look

nathanmarz 2018-05-18T23:24:45.000175Z

~@ within [] uniques and puts all the symbols into a vector

sophiago 2018-05-18T23:25:15.000281Z

Oh, I assumed you'd end up with a set inside a vector

nathanmarz 2018-05-18T23:25:28.000008Z

that would be if you did ~psyms

sophiago 2018-05-18T23:28:14.000041Z

I don't want to belabor that point, but you're saying psyms would evaluate to a set? My assumption was it would be like: [(set ["foo" "foo"])] => [#{"foo"}]

sophiago 2018-05-18T23:31:21.000100Z

Also, I think depending on how your navigator in the second example works I may just be able to just apply another transform on TREE to expr to replace the values and that's all I need.

nathanmarz 2018-05-18T23:33:55.000099Z

oops meant to write ~(set psyms)

sophiago 2018-05-18T23:34:23.000214Z

Oh, okay. That makes more sense

sophiago 2018-05-18T23:35:04.000060Z

I'm thinking through what a navigator for even this example would need to look like. It is a bit more involved than I initially assumed

sophiago 2018-05-18T23:37:24.000067Z

For it to work top-down in one traversal, it would have to backtrack after it either reaches the next x in depth or the bottom of the tree. That's the only way to make sure it adds in each symbol in a fn* [..] at the correct level.

nathanmarz 2018-05-18T23:38:00.000184Z

I've done similar things with dags before

sophiago 2018-05-18T23:38:29.000014Z

This all comes down to me not understanding Specter well enough to hack it out at this level of complexity

nathanmarz 2018-05-18T23:38:57.000145Z

in that algorithm each node has an id, and I look at the list of node ids to the root from everything I'm trying to find the lowest common root for

sophiago 2018-05-18T23:39:58.000107Z

I can't tell whether that means it would work for my purposes as stands or not

nathanmarz 2018-05-18T23:40:43.000179Z

you could do it with multiple passes by annotating a generated id metadata to all subexpressions, then find the path of ids to each instance of "p" symbols

nathanmarz 2018-05-18T23:41:04.000038Z

then with the paths its easy to identify which subexpressions to target in another pass

sophiago 2018-05-18T23:41:26.000123Z

Oh, the problem with your example is when to stop recursing on expr because if I replace all the way down I'll screw up the next match

sophiago 2018-05-18T23:43:19.000072Z

It does seem like multiple passes might be a good start. Then I can refactor to combine them. It's much easier if I can just have the navigator stop at the next px it sees.

sophiago 2018-05-18T23:44:31.000219Z

Another way to break it down would be to apply the second transform before the first and use just two passes. Then it can add in the bindings at common roots and eliminate the numerals in symbols in the next pass.

sophiago 2018-05-18T23:44:49.000027Z

That seems to make the most sense from how I'm grokking it now