trying out a recent version of a clojure grammar in atom
(fwiw, the rainbow parens are done by something else)
also am trying a grammar split into two pieces
there is a base grammar that provides things like symbol, keyword, number, and other "primitive" things.
there is a grammar which "inherits" from the base and it provides "var_definition"
so that would be things that look like (def a 1)
part of the file for tweaking highlighting in atom looks like:
comments:
start: ';'
scopes:
'source_file': 'source.clojure'
'nil': 'constant.language.null'
'boolean > "true"': 'constant.language.boolean.true'
'boolean > "false"': 'constant.language.boolean.false'
'comment': 'comment.line'
'number': 'constant.numeric'
'keyword': 'constant.keyword'
'string': 'string.quoted'
'regular_expression': 'string.regexp'
'character': 'constant.character'
'list > symbol': [
{
match: '^(&|%|<|=|>|\\+|\\-|/|\\*|->|->>|alias|and|assert|case|catch|cond|declare|defn|defn-|do|fn|if|if-let|let|loop|not|ns|or|recur|require|some->|throw|try|when|when-let)$'
scopes: 'keyword.control'
}
]
# XXX: does def
'var_definition > "def"': 'keyword.control'
(https://flight-manual.atom.io/hacking-atom/sections/creating-a-grammar/)trying out another grammar idea that removes the necessity for externals -- have put ^
, #^
, #'
, @
, backtick, and '
into a _sigils
grouping and instead of trying to make them extras
, have made zero or more of them be optional in front of literals. clojure.core seems to have parsed ok and back to 20ms 🙂
about to run the large sample tests.
(this was not motivated by performance, fwiw 😉 )
What would @
do in front of a literal?
I'm asking because I have the following note in my lexer (non-tree-sitter, as you know):
// prefixing patterns - TODO: revisit these and see if we can always use the same
// opens ((?<!\p{Ll})['`~#@?^]\s*)*
// id (['`~#^@]\s*)*
// lit (['`~#]\s*)*
// kw (['`~^]\s*)*
first, the definition of a literal here may likely be different from yours.
second, in tree-sitter afaik we cannot express the idea of "don't apply this rule in situation x" directly. we have to some how provide some other thing to parse as via precendence or some other technique. consequently, to handle certain things, it appears to be better sometimes to be much more lax in what is allowed.
with that said, the places i have identified so far where @
can appear are in front of symbols, but also in front of something that returns an IRef:
@(atom [1 2])
in the grammar here, lists are identified as literals.does that make sense?
btw, the sigil idea was thought of by my better half, but we were inspired by studying your tokenizer (which i attempted but failed at porting yesterday to cljs 🙂 )
Oh, please let me know if I you have any progress in porting it to cljs, or if I can help in some way. I'd love to move it into the cljs parts of Calva.
Regarding my question it seems to be answered by: > in the grammar here, lists are identified as literals 😃
re: porting to cljs - i might wait to try again - what i found out yesterday is that doing regex stuff in cljs for working with a clojure grammar is pretty challenging.
e.g. some rules involve the hash, but i like using the (?x) free-spacing mode to express my regular expressions for readability (using the xregexp library, for example). however, i didn't find a way to include #
in a free-spacing mode regex -- it gets interpreted as a the beginning of a comment.
i also tried lambdaisland's new regal library, but it doesn't yet support look ahead or behind assertions, at least one of which was needed to express at least one of the patterns. may be you've tried it already?
i did finally manage to express the patterns but only by using variables containing strings but to make things readable i used odd spacing (which is very fragile).
so i'm thinking to wait for regal to develop a bit more before trying again.
my initial target is only lexer.ts and clojure-lexer.ts though...
I wish I had time to help out some with regal. I haven't even tried it yet, but it is such a great initiative. Reminds me how much I miss Perl's readable regex variants.
thanks for asking about the literals, btw -- that added additional motivation to rename and reorganize. it should be less confusing now :)