tree-sitter

tree-sitter fun
2020-03-04T01:29:34.014200Z

experimenting with handling metadata as tokens and placing in extras. the docs say: > ...an array of tokens that may appear anywhere in the language. This is often used for whitespace and comments. (search for extras at: https://tree-sitter.github.io/tree-sitter/creating-parsers#the-grammar-dsl) to make this work though, it was necessary to use externals -- and that meant writing some c++. so far it seems to be working out -- though performance on clojure.core has gone from 20ms to 25ms or so.

pez 2020-03-04T07:00:44.015700Z

So is this array added to the syntax tree tokens?

pez 2020-03-04T07:08:57.016400Z

Nm, i read under extras, not extenals.

pez 2020-03-04T07:09:24.017300Z

But it is unclear to me why externals would be needed.

2020-03-04T08:49:53.022Z

ah, thanks for taking a look into things. one description might go something like this: one path of investigation is handling metadata as items in extras [1]. after trying other things, making ^{...} one of externals was where i ended up. the other 3 metadata things (^symbol, ^keyword, ^string) were doable within grammar.js (i.e. not external).

2020-03-04T08:50:37.022700Z

[1] the reason to do this is because baking metadata into the grammar is (currently) problematic for a few reasons.

2020-03-04T08:51:23.023500Z

- is it practical to enumerate all the places metadata can appear? here's some initial investigation: https://gist.github.com/sogaiu/219e36266dc4265ec2c482622e2c8589

2020-03-04T08:52:16.024700Z

- the gist shows multiple places metadata can appear but it's not clear whether that's exhaustive, so it's unclear how long it might take to figure out such places.

2020-03-04T08:52:51.025500Z

- already there are quite a few places, and baking that info into the grammar at each site doesn't seem worth the increase in complexity to implement as well as to maintain

2020-03-04T08:56:01.027400Z

for these reasons, i decided to try out the idea of making metadata a member of extras. (actually, it's split into two things (map and other things), but they are now living as extras here.)

2020-03-04T09:00:49.028900Z

the main reason to make metadata maps one of externals is that, iiuc, it requires balancing of curly braces and afaik that is not doable with js regular expressions, let alone the subset that tree-sitter supports. though i guess it can be done with perl, ruby, and a few other flavors (`:R` iiuc).

2020-03-04T09:01:35.029600Z

so atm to make metadata maps a token, i've only come up with the externals method.

2020-03-04T09:02:44.030200Z

i hope this explanation helps a bit, please feel free to poke holes in it, question it, etc. 🙂

2020-03-04T11:43:45.031300Z

interestingly, i rewrote part of the externals code for readability purposes and now the perfomance has improved -- instead of around 25ms, it is consistenly around 22ms. this is despite having added more functionality.

🤘 1