rewrite-clj

https://github.com/clj-commons/rewrite-clj
2020-08-27T10:10:56.081Z

so on the subject of official naming of things, in the realm of tagged literals, i want to make a distinction when communicating among the following three types of things: 1) #inst - the first bit 2) #inst "2014-05-19T19:12:37.925-00:00" - the whole thing 3) "2014-05-19T19:12:37.925-00:00" - the thing the first bit "applies to" (in the recursive case, this can be something from 2) iiuc) so looking at official-ish docs: https://clojure.org/guides/weird_characters#tagged_literals my impression is that for 3), "literal value" might be a possible name. i see the text "the #js tag", so that supports the idea that examples of 1) might be "tag". however, the later parts of the text confuse me. i also see the text "We can use Clojure's read-string to read a tagged literal..." followed by example code, so that seems suggestive that for 2), "tagged literal" might be appropriate. the confusing bits for me include: * "A tagged literal tells the reader how to parse the literal value." - i might have thought that the second might have started with "A tag" instead. * "an extremely common use of tagged literals is #js which" - again it seems like something along the lines "of tags is #js" would be more along my expectations. for even more fun, consider the text via the link in that section: https://github.com/edn-format/edn#tagged-elements > # followed immediately by a symbol starting with an alphabetic character indicates that that symbol is a tag. so here, it seems like the # character is not part of the tag. ofc this is edn, so it's not exactly the same. any opinions on these bits?

2020-08-27T10:15:36.083300Z

i'm trying to work on finalizing some grammar-related bits because they essentially end up being a public api and i thought it would be helpful to get opinions before settling on things. i hope people here don't mind that i'm asking in this channel.

borkdude 2020-08-27T10:21:09.085600Z

The #foo bit is a tag. The thing that follows is a literal (although not restricted to EDN). So the literal is tagged, hence tagged literal.

2020-08-27T10:59:22.086200Z

thanks. (for the log -- the folllowing also seems to support that interpretation: https://clojure.org/reference/reader#tagged_literals)

2020-08-27T11:00:58.087700Z

we can also have things structured like this iiuc:

#tag1 #tag2 "fun-thing"
where #tag1 ends up "applying to" #tag2 "fun-thing". in this case, i wonder whether to refer to #tag2 "fun-thing" as a literal. it seems somehow not right to me.

borkdude 2020-08-27T11:05:14.088300Z

Tag2 x will be resolved into another literal first.

2020-08-27T11:05:29.088900Z

yes i understand how it works

borkdude 2020-08-27T11:05:34.089200Z

And that will be the literal tagged with tag1

2020-08-27T11:05:53.089600Z

i am referring to how to refer to the pieces when talking about them.

2020-08-27T11:06:13.090Z

for example, one might have a term "tagee" to refer to what a tag applies to.

borkdude 2020-08-27T11:07:03.090700Z

That’s already what tagged literal is. It’s the thing being tagged

2020-08-27T11:07:29.091200Z

so then we have a situation where #tag2 "fun-thing" is both a literal and a tagged literal?

borkdude 2020-08-27T11:08:26.091700Z

It’s only a literal after being read

2020-08-27T11:08:31.091900Z

lol

2020-08-27T11:08:39.092200Z

when working on a grammar, there is no reading

2020-08-27T11:09:32.093100Z

from the perspective of the grammar, it makes sense to refer to the #tag2 "fun-thing" as some thing that the #tag1 applies to.

2020-08-27T11:09:57.093600Z

having a name for that makes it possible to talk about it without describing it repeatedly.

borkdude 2020-08-27T11:10:09.094Z

Taggee makes sense then

2020-08-27T11:10:14.094200Z

ok, thanks 🙂

lread 2020-08-27T15:06:56.095700Z

It’s hard to talk about parts of source code that don’t seem to have clear definitive terms. I do see tagged literals being described as “tag” and “form” https://insideclojure.org/2018/06/21/tagged-literal/. I suppose if you wanted to talk about the form in context, it could be the “tagged form”.

2020-08-28T09:00:16.111800Z

"tagged form" sounds like a good candidate.

2020-08-28T09:01:24.112Z

on a side note, any ideas about a similar thing for metadata? as in, what's the thing the metadata is supposed to apply to?

lread 2020-08-28T21:01:52.112700Z

It’s tricky, huh? If I look a Clojure docs on the subject https://clojure.org/reference/metadata, it seems “metadata” is clear, but the thing it is describing has multiple terms. “Object” is used in descriptive text and APIs, but “data” is also used in text and also an attractive choice. “Object” might be more telling as metadata cannot be applied to all things - but really, that might be describing an implementation detail. So I think I personally prefer “data”. As for what to call the “metadata” and “data” together, I don’t know if a term is needed for that.

2020-08-28T22:34:59.112900Z

yes it is tricky - i realize now that unless you are working on a grammar much of this may seem irrelevant. it turns out that for grammar work, depending on how things are decomposed, what you might want a name for is impacted. as an example from parcera's current grammar, consider: https://github.com/carocad/parcera/blob/master/src/Clojure.g4#L66 here the name is metadata, but it refers to the combination of one or more pieces of metadata and the thing it applies to. in one of my tree-sitter grammar attempts, i tried to have the metadata bits live "inside" the thing it applies to and consequently (atm) don't have a need for the combined term for the tree-sitter grammar. however, if it turns out that's too slow or there is a need to change the grammar for another reason, i might need such a name. does that make sense? another place where the necessity for such names may surface is in the ui of editors - imagine an option for choosing highlighting or styling of specific portions of code (e.g. display what the discard #_ macro reader marker(?) applies to, or dim the metadata annotations(?) but not the thing it applies to, etc.) fwiw, with tree-sitter support in some editors coming about, these are not far-fetched possibilities. in any case, i appreciate the input and feeback i have received so far but perhaps this channel is not the best place to discuss this.

lread 2020-08-29T13:51:27.113900Z

I feel that the discussion is relevant to the channel. 🙂 Unless working on the individual parts of speech, maybe only the phrase, or the result of interpreting the phrase are important? In was just trying to figure out what to call :: in the context of an auto-resolved namespace map. Clojure docs don’t seem to explicitly give :: a name in itself. I see that parcera calls it https://github.com/carocad/parcera/blob/1c513564fa6549fef7d39477b8bc359380f1942e/src/Clojure.g4#L126.

2020-08-29T14:59:58.114200Z

i ended up choosing to call that auto_resolve_marker. i also considered auto_resolve_mark. i think "sigil" is a term i've heard in at least the perl community, but wasn't so sure of its appropriateness, so opted against auto_resolve_sigil. auto_resolve_symbol also seemed like it might be confusing.

2020-08-29T15:51:27.114400Z

on a (sorry, long so feel free to skip 🙂 ) side note, my current impression is that the situation parcera is in differs from tree-sitter's in at least one important way. the current parcera implementation uses antlr and then arranges for hiccup output based on that as a kind of multi-stage process (https://github.com/carocad/parcera/blob/1c513564fa6549fef7d39477b8bc359380f1942e/src/clojure/parcera/core.cljc#L69-L70). consequently, it has the option of changing the names of things in the antlr result to hiccup transition. more concretely, Clojure.g4's names (the antlr ones) in a way don't matter so much (to the user of the hiccup result) because there is an option of changing them during the "translation" to hiccup. (although that would probably reduce performance somewhat so may be one wouldn't want to.) unfortunately for the tree-sitter case, afaiu, once one decides on names within grammar.js (the typical tree-sitter grammar file), one is in a way stuck with those names -- those names show up in the parse tree which a consumer has access to (so one might think of the names as being part of a public api). the reason this is relevant is that within parcera's Clojure.g4 file, one can choose names that make sense "in-context". for example, here: https://github.com/carocad/parcera/blob/1c513564fa6549fef7d39477b8bc359380f1942e/src/Clojure.g4#L110 one can see the name var_quote. that name is within the context of dispatch which is in turn in the context reader_macro. so if one has the grammar's structure in mind, the name is not too confusing. however, for the tree-sitter case, if one were to choose that kind of name, all you get as an end-user from the parse tree is var_quote (which might suggest #'), which is less clear than var_quote_form outside of the context of a grammar file (which is the case when all you have is the parse tree). the issue is somewhat compounded by the total number of names of things in the grammar (each pair of terms being the potential source of confusion) in addition to some terminological gaps and confusion (which afaict are partly inherited from lisp communities). atm, for tree-sitter i am trying to choose names that can stand on their own (when compared to each of the other names) to ease learning and reduce confusion when reasoning for the user of the parse tree (editor-tooling folks being one prime target). thus a name like "form" would likely invite confusion if used to describe something specific to tagged literals or metadata in a tree-sitter parse tree. come to think of it, many of the other tree-sitter grammars i've looked at do tend to use rather long specific names...perhaps the aforementioned reasoning went into such decisions in some of those cases. so if you made it this far, i hope that explanation made it a bit clearer where i'm coming from 😅

lread 2020-08-29T18:55:17.115700Z

I found that interesting, thanks for sharing!

2020-08-30T02:25:50.115900Z

thanks for tolerating these emissions 🙂 btw, i am still trying to understand better your recent remark: > Unless working on the individual parts of speech, maybe only the phrase, or the result of interpreting the phrase are important? would you mind elaborating on that at some point? possibly with an example?

lread 2020-08-31T15:48:18.116100Z

I guess I might have been getting a bit philosophical. 🙂 I simply meant that the Clojure core team didn’t really need to give a specific name to :: (a part of speech) when they were describing, for example, auto-resolved current namespace maps ::{:a 2} (a phrase).

2020-08-31T21:54:13.116300Z

ah, thanks for the explanation. that specific thing is interesting because it seems to be indirectly referred to on the reference page for the reader: > Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s. https://clojure.org/reference/reader#_reader_forms

2020-08-31T22:56:49.116900Z

i am guessing, but it seems to me that constraint might be partly due to trying to avoid a collision with the use of :: for auto-resolving. if one were physically speaking about this, it might be easier to discuss if :: had a name. perhaps "double colon" might be something that might get used if one were speaking, but if you type "double colon" it really doesn't seem to have a whole lot of benefit over ::. so i wonder if the medium of typical discussion about these things (and for that matter, the number of people who might actually converse about it) might affect whether something gets a name and further the type of name something might get. in typical grammars (e.g. for antlr or tree-sitter), there are constraints about what you can name things (e.g. restricted characters) -- typically similar to those in programs so perhaps you tend to end up with words (i.e. :: wouldn't be a valid name).

lread 2020-09-01T15:45:05.119500Z

Yep, good points. My first crack at supporting namespaced maps uses the illegal Clojure symbol ::. My next version won’t do that. :simple_smile:

2020-09-01T21:57:14.121700Z

he he 🙂

2020-09-02T05:03:52.124700Z

btw, thanks to recent discussions here plus investigations, i realized i wasn't really clearly aware of this: https://clojure.org/reference/reader#_deftype_defrecord_and_constructor_calls_version_1_3_and_later it looks like they can be recognized just like tagged-literals. indeed tagged literals and these things both seem to be handled by CtorReader in LispReader.java. most grammars i've seen refer to tagged literals (or tags) with no apparent mention of this other thing. i have been using tagged_literal (or something involving tag) as a name for the construct, but wonder whether i should somehow sneak in ctor. fwiw, there was some discussion about this here: https://github.com/carocad/parcera/issues/83

lread 2020-09-02T10:42:47.126400Z

hmmm... thanks! I’ll have to check if/how rewrite-clj handles this.

2020-09-02T11:10:19.126600Z

btw, when i looked at the official docs, it wasn't clear to me where spaces, discards, comments and such could go. i did some experimentation and my current conclusion is that it's the same as tagged literals. here's a small repl session snippet:

user=> (defrecord Fun [a b])
<http://user.Fun|user.Fun>
user=&gt; # <http://user.Fun|user.Fun> [1 2]
#<http://user.Fun|user.Fun>{:a 1, :b 2}
(note the space between # and <http://user.Fun|user.Fun> as well as the space between <http://user.Fun|user.Fun> and [1 2]) i don't know how much sense the following would make, but fwiw, here's how the current tree-sitter grammar handles it: https://github.com/sogaiu/tree-sitter-clojure/blob/master/grammar.js#L418-L430

2020-09-03T06:37:09.127400Z

re: names used in speech -- happened to be watching this rh talk and heard him refer to #= as "sharp equals" around here: https://youtu.be/I5iNUtrYQSM?t=3734

borkdude 2020-08-27T15:16:15.096500Z

I've also heard the name "reader tag". I think it's important to realize they are processed at read-time, which is before macro-expansion time

lread 2020-08-27T15:29:02.098Z

Good point, and the term “reader tag” is in doc strings https://github.com/clojure/clojure/blob/38bafca9e76cd6625d8dce5fb6d16b87845c8b9d/src/clj/clojure/core.clj#L7781

2020-08-27T21:57:14.100Z

thanks for sharing these bits. may be i should consider collecting these sorts of things somewhere for general public reference.