so on the subject of official naming of things, in the realm of tagged literals, i want to make a distinction when communicating among the following three types of things:
1) #inst
- the first bit
2) #inst "2014-05-19T19:12:37.925-00:00"
- the whole thing
3) "2014-05-19T19:12:37.925-00:00" - the thing the first bit "applies to" (in the recursive case, this can be something from 2) iiuc)
so looking at official-ish docs: https://clojure.org/guides/weird_characters#tagged_literals
my impression is that for 3), "literal value" might be a possible name.
i see the text "the #js tag", so that supports the idea that examples of 1) might be "tag". however, the later parts of the text confuse me.
i also see the text "We can use Clojure's read-string to read a tagged literal..." followed by example code, so that seems suggestive that for 2), "tagged literal" might be appropriate.
the confusing bits for me include:
* "A tagged literal tells the reader how to parse the literal value." - i might have thought that the second might have started with "A tag" instead.
* "an extremely common use of tagged literals is #js which" - again it seems like something along the lines "of tags is #js" would be more along my expectations.
for even more fun, consider the text via the link in that section: https://github.com/edn-format/edn#tagged-elements
> # followed immediately by a symbol starting with an alphabetic character indicates that that symbol is a tag.
so here, it seems like the #
character is not part of the tag. ofc this is edn, so it's not exactly the same.
any opinions on these bits?
i'm trying to work on finalizing some grammar-related bits because they essentially end up being a public api and i thought it would be helpful to get opinions before settling on things. i hope people here don't mind that i'm asking in this channel.
The #foo bit is a tag. The thing that follows is a literal (although not restricted to EDN). So the literal is tagged, hence tagged literal.
thanks. (for the log -- the folllowing also seems to support that interpretation: https://clojure.org/reference/reader#tagged_literals)
we can also have things structured like this iiuc:
#tag1 #tag2 "fun-thing"
where #tag1
ends up "applying to" #tag2 "fun-thing"
. in this case, i wonder whether to refer to #tag2 "fun-thing"
as a literal. it seems somehow not right to me.Tag2 x will be resolved into another literal first.
yes i understand how it works
And that will be the literal tagged with tag1
i am referring to how to refer to the pieces when talking about them.
for example, one might have a term "tagee" to refer to what a tag applies to.
That’s already what tagged literal is. It’s the thing being tagged
so then we have a situation where #tag2 "fun-thing"
is both a literal and a tagged literal?
It’s only a literal after being read
lol
when working on a grammar, there is no reading
from the perspective of the grammar, it makes sense to refer to the #tag2 "fun-thing"
as some thing that the #tag1
applies to.
having a name for that makes it possible to talk about it without describing it repeatedly.
Taggee makes sense then
ok, thanks 🙂
It’s hard to talk about parts of source code that don’t seem to have clear definitive terms. I do see tagged literals being described as “tag” and “form” https://insideclojure.org/2018/06/21/tagged-literal/. I suppose if you wanted to talk about the form in context, it could be the “tagged form”.
"tagged form" sounds like a good candidate.
on a side note, any ideas about a similar thing for metadata? as in, what's the thing the metadata is supposed to apply to?
It’s tricky, huh? If I look a Clojure docs on the subject https://clojure.org/reference/metadata, it seems “metadata” is clear, but the thing it is describing has multiple terms. “Object” is used in descriptive text and APIs, but “data” is also used in text and also an attractive choice. “Object” might be more telling as metadata cannot be applied to all things - but really, that might be describing an implementation detail. So I think I personally prefer “data”. As for what to call the “metadata” and “data” together, I don’t know if a term is needed for that.
yes it is tricky - i realize now that unless you are working on a grammar much of this may seem irrelevant. it turns out that for grammar work, depending on how things are decomposed, what you might want a name for is impacted. as an example from parcera's current grammar, consider: https://github.com/carocad/parcera/blob/master/src/Clojure.g4#L66 here the name is metadata, but it refers to the combination of one or more pieces of metadata and the thing it applies to. in one of my tree-sitter grammar attempts, i tried to have the metadata bits live "inside" the thing it applies to and consequently (atm) don't have a need for the combined term for the tree-sitter grammar. however, if it turns out that's too slow or there is a need to change the grammar for another reason, i might need such a name. does that make sense? another place where the necessity for such names may surface is in the ui of editors - imagine an option for choosing highlighting or styling of specific portions of code (e.g. display what the discard #_ macro reader marker(?) applies to, or dim the metadata annotations(?) but not the thing it applies to, etc.) fwiw, with tree-sitter support in some editors coming about, these are not far-fetched possibilities. in any case, i appreciate the input and feeback i have received so far but perhaps this channel is not the best place to discuss this.
I feel that the discussion is relevant to the channel. 🙂
Unless working on the individual parts of speech, maybe only the phrase, or the result of interpreting the phrase are important?
In was just trying to figure out what to call ::
in the context of an auto-resolved namespace map. Clojure docs don’t seem to explicitly give ::
a name in itself. I see that parcera calls it https://github.com/carocad/parcera/blob/1c513564fa6549fef7d39477b8bc359380f1942e/src/Clojure.g4#L126.
i ended up choosing to call that auto_resolve_marker. i also considered auto_resolve_mark. i think "sigil" is a term i've heard in at least the perl community, but wasn't so sure of its appropriateness, so opted against auto_resolve_sigil. auto_resolve_symbol also seemed like it might be confusing.
on a (sorry, long so feel free to skip 🙂 ) side note, my current impression is that the situation parcera is in differs from tree-sitter's in at least one important way.
the current parcera implementation uses antlr and then arranges for hiccup output based on that as a kind of multi-stage process (https://github.com/carocad/parcera/blob/1c513564fa6549fef7d39477b8bc359380f1942e/src/clojure/parcera/core.cljc#L69-L70).
consequently, it has the option of changing the names of things in the antlr result to hiccup transition. more concretely, Clojure.g4's names (the antlr ones) in a way don't matter so much (to the user of the hiccup result) because there is an option of changing them during the "translation" to hiccup. (although that would probably reduce performance somewhat so may be one wouldn't want to.)
unfortunately for the tree-sitter case, afaiu, once one decides on names within grammar.js (the typical tree-sitter grammar file), one is in a way stuck with those names -- those names show up in the parse tree which a consumer has access to (so one might think of the names as being part of a public api).
the reason this is relevant is that within parcera's Clojure.g4 file, one can choose names that make sense "in-context". for example, here: https://github.com/carocad/parcera/blob/1c513564fa6549fef7d39477b8bc359380f1942e/src/Clojure.g4#L110 one can see the name var_quote
. that name is within the context of dispatch
which is in turn in the context reader_macro
. so if one has the grammar's structure in mind, the name is not too confusing.
however, for the tree-sitter case, if one were to choose that kind of name, all you get as an end-user from the parse tree is var_quote
(which might suggest #'
), which is less clear than var_quote_form
outside of the context of a grammar file (which is the case when all you have is the parse tree).
the issue is somewhat compounded by the total number of names of things in the grammar (each pair of terms being the potential source of confusion) in addition to some terminological gaps and confusion (which afaict are partly inherited from lisp communities).
atm, for tree-sitter i am trying to choose names that can stand on their own (when compared to each of the other names) to ease learning and reduce confusion when reasoning for the user of the parse tree (editor-tooling folks being one prime target).
thus a name like "form" would likely invite confusion if used to describe something specific to tagged literals or metadata in a tree-sitter parse tree.
come to think of it, many of the other tree-sitter grammars i've looked at do tend to use rather long specific names...perhaps the aforementioned reasoning went into such decisions in some of those cases.
so if you made it this far, i hope that explanation made it a bit clearer where i'm coming from 😅
I found that interesting, thanks for sharing!
thanks for tolerating these emissions 🙂 btw, i am still trying to understand better your recent remark: > Unless working on the individual parts of speech, maybe only the phrase, or the result of interpreting the phrase are important? would you mind elaborating on that at some point? possibly with an example?
I guess I might have been getting a bit philosophical. 🙂 I simply meant that the Clojure core team didn’t really need to give a specific name to ::
(a part of speech) when they were describing, for example, auto-resolved current namespace maps ::{:a 2}
(a phrase).
ah, thanks for the explanation. that specific thing is interesting because it seems to be indirectly referred to on the reference page for the reader: > Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s. https://clojure.org/reference/reader#_reader_forms
i am guessing, but it seems to me that constraint might be partly due to trying to avoid a collision with the use of ::
for auto-resolving.
if one were physically speaking about this, it might be easier to discuss if ::
had a name. perhaps "double colon" might be something that might get used if one were speaking, but if you type "double colon" it really doesn't seem to have a whole lot of benefit over ::
. so i wonder if the medium of typical discussion about these things (and for that matter, the number of people who might actually converse about it) might affect whether something gets a name and further the type of name something might get.
in typical grammars (e.g. for antlr or tree-sitter), there are constraints about what you can name things (e.g. restricted characters) -- typically similar to those in programs so perhaps you tend to end up with words (i.e. ::
wouldn't be a valid name).
Yep, good points. My first crack at supporting namespaced maps uses the illegal Clojure symbol ::
. My next version won’t do that. :simple_smile:
he he 🙂
btw, thanks to recent discussions here plus investigations, i realized i wasn't really clearly aware of this: https://clojure.org/reference/reader#_deftype_defrecord_and_constructor_calls_version_1_3_and_later
it looks like they can be recognized just like tagged-literals. indeed tagged literals and these things both seem to be handled by CtorReader in LispReader.java. most grammars i've seen refer to tagged literals (or tags) with no apparent mention of this other thing.
i have been using tagged_literal
(or something involving tag
) as a name for the construct, but wonder whether i should somehow sneak in ctor
.
fwiw, there was some discussion about this here: https://github.com/carocad/parcera/issues/83
hmmm... thanks! I’ll have to check if/how rewrite-clj handles this.
btw, when i looked at the official docs, it wasn't clear to me where spaces, discards, comments and such could go. i did some experimentation and my current conclusion is that it's the same as tagged literals. here's a small repl session snippet:
user=> (defrecord Fun [a b])
<http://user.Fun|user.Fun>
user=> # <http://user.Fun|user.Fun> [1 2]
#<http://user.Fun|user.Fun>{:a 1, :b 2}
(note the space between #
and <http://user.Fun|user.Fun>
as well as the space between <http://user.Fun|user.Fun>
and [1 2]
)
i don't know how much sense the following would make, but fwiw, here's how the current tree-sitter grammar handles it: https://github.com/sogaiu/tree-sitter-clojure/blob/master/grammar.js#L418-L430re: names used in speech -- happened to be watching this rh talk and heard him refer to #=
as "sharp equals" around here: https://youtu.be/I5iNUtrYQSM?t=3734
I've also heard the name "reader tag". I think it's important to realize they are processed at read-time, which is before macro-expansion time
Good point, and the term “reader tag” is in doc strings https://github.com/clojure/clojure/blob/38bafca9e76cd6625d8dce5fb6d16b87845c8b9d/src/clj/clojure/core.clj#L7781
thanks for sharing these bits. may be i should consider collecting these sorts of things somewhere for general public reference.