Question: I have a simple transform
expression with a recursive navigator very similar to the one in the example that I'm using in a macro to navigate to every symbol in an AST and replace parts of certain ones (unfortunately converting to and from strings and using regex since I only need to replace part of these symbols). I'd also like it to jump one level up in the AST from each match, or more precisely group of matches with the same substring, and wrap it in a list at that point. Can anyone recommend a starting point for thinking about defining a relative path with specter like this? I would ideally like to avoid the perf overhead of two separate transform
s.
Oh, and this also ignores the issue of wanting to group together multiple occurrences of the same substring. So the symbol added in the second transform
should only ever be one, but the predicate would have to be modified to match on the collection above all other collections that contain that exact substring.
@sophiago that first one can be simplified to (setval [TREE NAME #"p[1-9]"] "p" (macroexpand x))
@sophiago uploaded a file: https://clojurians.slack.com/files/U2TCUSM2R/FASH2M1QB/-.clj
@nathanmarz ah, thanks. I wrote this very late last night and was thinking I should have been able to do something like that
I just posted an example of what that transform
does with what I hope is a slightly better description of what I'd like to add to it
(I also omitted mentioning I'm using a fork of the compiler where the result would actually be valid...i.e. nested function literals where %
shadow one another if necessary)
not understanding what you're trying to do with that second transform
Let me provide a more succinct and precise example
There's one place in the macroexpansion above where the bound variable occurs twice: (*'(long (. Math pow 10 (first p__13958#))) (second p__13958#))
So, ideally in the same transform
, I'd like that to become: (fn* [p__13958#] (*'(long (. Math pow 10 (first p__13958#))) (second p__13958#))
The cases where the bound variables only appear once are simpler, although I'm still unsure how to handle them in one pass with Specter
so you want to collect every unique symbol starting with "p" and wrap the expression with (fn* [<collected p's>] ...)
?
Yes, but only wrap the deepest subexpression where the symbols are exactly the same. But the numerals will never repeat in the input, so saying exact substrings is good enough.
for wrapping once you identify the subexpression you can do it like:
(transform (collect TREE symbol? (selected? NAME #"p[1-9]"))
(fn [psyms expr]
`(fn* [~@(set psyms)] ~expr)
)
data)
assuming TREE
goes to every leaf
TREE
goes to every coll currently
as for "deepest subexpression where the symbols are exactly the same" that sounds like a more involved algorithm
the same "p" symbol could exist deeply nested across multiple branches
Right, that's much more involved than simple one list up.
I can make some assumptions based on what would constitute valid input. For example, how px_foo means foo will always be the same for every x. Similarly, for each x: they'll be grouped together as far as depth in the entire AST (not sure whether that makes sense).
not really
So both those in the example I pulled out started with p3. There would never be a p3 at a higher level of nesting or below the level where there's a p4. I'm struggling for language to describe that. Like the "x" in "px" is monotonic with AST depth?
you mean the depth of the root of the subexpression containing all instances of pn is monotonic with n?
So when you mentioned multiple branches, I can actually ignore that. It's only a matter of variance in the number of lists you'd need to jump up for the symbols to be in the same one.
"depth of the root of the subexpression containing all instances of pn is monotonic with n" => I believe this is exactly what I'm stating
Just some confusion over your use of the term "root"
the difference between:
[:a [1 [2]] [1 [2]]]
[:a [1 1] [[2] [2]]]
in the first, the root for both is [:a [1 [2]] [1 [2]]]
in the second, the root for "1" is [1 1]
and for "2" [[2] [2]]
is that what you're getting at?
Yes. You can assume it's like your second example.
To backtrack a bit (no pun intended) in (setval [TREE NAME #"p[1-9]"] "p" (macroexpand x))
will "p" be made into a symbol or do I need to add that?
I hate treating symbols as strings, but seems unavoidable here 😕
To be clear, the transformation at that level is p3__13962# => p__13962#
it replaces the substring within the symbol
behind the scenes NAME
is extracting the string for the symbol's name, manipulating it, and then reconstructing the symbol
Oh, that's awesome. Such a common pain point.
as for the subexpression identification part, that seems more involved
Also, in your second example, shouldn't [~@(set psyms)]
be either ~@(into [] (set psyms))
or ~@(into [] (distinct psyms))
?
into
or vec
, I'd have to look
~@ within []
uniques and puts all the symbols into a vector
Oh, I assumed you'd end up with a set inside a vector
that would be if you did ~psyms
I don't want to belabor that point, but you're saying psyms
would evaluate to a set? My assumption was it would be like: [(set ["foo" "foo"])]
=> [#{"foo"}]
Also, I think depending on how your navigator in the second example works I may just be able to just apply another transform
on TREE
to expr
to replace the values and that's all I need.
oops meant to write ~(set psyms)
Oh, okay. That makes more sense
I'm thinking through what a navigator for even this example would need to look like. It is a bit more involved than I initially assumed
For it to work top-down in one traversal, it would have to backtrack after it either reaches the next x in depth or the bottom of the tree. That's the only way to make sure it adds in each symbol in a fn* [..]
at the correct level.
I've done similar things with dags before
This all comes down to me not understanding Specter well enough to hack it out at this level of complexity
in that algorithm each node has an id, and I look at the list of node ids to the root from everything I'm trying to find the lowest common root for
I can't tell whether that means it would work for my purposes as stands or not
you could do it with multiple passes by annotating a generated id metadata to all subexpressions, then find the path of ids to each instance of "p" symbols
then with the paths its easy to identify which subexpressions to target in another pass
Oh, the problem with your example is when to stop recursing on expr
because if I replace all the way down I'll screw up the next match
It does seem like multiple passes might be a good start. Then I can refactor to combine them. It's much easier if I can just have the navigator stop at the next px it sees.
Another way to break it down would be to apply the second transform
before the first and use just two passes. Then it can add in the bindings at common roots and eliminate the numerals in symbols in the next pass.
That seems to make the most sense from how I'm grokking it now