experimenting with handling metadata as tokens and placing in extras
.
the docs say:
> ...an array of tokens that may appear anywhere in the language. This is often used for whitespace and comments.
(search for extras
at: https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl)
to make this work though, it was necessary to use externals
-- and that meant writing some c++.
so far it seems to be working out -- though performance on clojure.core has gone from 20ms to 25ms or so.
So is this array added to the syntax tree tokens?
Nm, i read under extras, not extenals.
But it is unclear to me why externals would be needed.
ah, thanks for taking a look into things.
one description might go something like this:
one path of investigation is handling metadata as items in extras
[1].
after trying other things, making ^{...}
one of externals
was where i ended up. the other 3 metadata things (^symbol, ^keyword, ^string) were doable within grammar.js
(i.e. not external).
[1] the reason to do this is because baking metadata into the grammar is (currently) problematic for a few reasons.
- is it practical to enumerate all the places metadata can appear? here's some initial investigation: https://gist.github.com/sogaiu/219e36266dc4265ec2c482622e2c8589
- the gist shows multiple places metadata can appear but it's not clear whether that's exhaustive, so it's unclear how long it might take to figure out such places.
- already there are quite a few places, and baking that info into the grammar at each site doesn't seem worth the increase in complexity to implement as well as to maintain
for these reasons, i decided to try out the idea of making metadata a member of extras
. (actually, it's split into two things (map and other things), but they are now living as extras
here.)
the main reason to make metadata maps one of externals
is that, iiuc, it requires balancing of curly braces and afaik that is not doable with js regular expressions, let alone the subset that tree-sitter supports. though i guess it can be done with perl, ruby, and a few other flavors (`:R` iiuc).
so atm to make metadata maps a token, i've only come up with the externals
method.
i hope this explanation helps a bit, please feel free to poke holes in it, question it, etc. 🙂
interestingly, i rewrote part of the externals code for readability purposes and now the perfomance has improved -- instead of around 25ms, it is consistenly around 22ms. this is despite having added more functionality.
hadn't seen this: https://github.com/github/semantic/blob/master/docs/grammar-development-guide.md