tree-sitter

tree-sitter fun
2020-02-22T02:35:22.072800Z

i love squirrels, but this one seems to cause one of our grammars to hang: https://github.com/bartojs/leiningen/blob/master/test_projects/sample_failing/src/nom/nom/nom.clj

2020-02-22T03:56:46.073800Z

ah, interesting, upgrading tree-sitter-cli to 0.16.4 fixes this hanging (other files were leading to hanging too -- will check how they are now handled) <- @chrisoakman

1
2020-02-22T05:40:23.079800Z

i think i have some evidence that the functionality level in a grammar doesn't necessarily impact performance a whole lot. two grammars were fed 100,000 or so files. the total time for processing differed by less than 5 minutes (both were somewhat above 70 minutes -- each parsing was a single process invocation fwiw). also examined the processing of clojure.core by one grammar vs another. it is reproducible that one is slower, but the reason for the slowness may have to do with the number of errors that show up. so it may be that if the number of errors is reduced / eliminated, the performance will improve a lot. (i got this idea from watching the > 1 hour tree-sitter talk (the galois one).)

2020-02-22T05:42:07.081300Z

the > 100,000 files were obtained by cloning a fair number of github repositories (the urls were obtained from clojars' feed.clj file) -- so there is some overlap within the data because of forks.

2020-02-22T06:10:10.083600Z

@pez i noticed in tree-sitter's api.h the following: https://github.com/tree-sitter/tree-sitter/blob/master/lib/include/tree_sitter/api.h#L577_L587 > A tree cursor allows you to walk a syntax tree more efficiently than is possible using the TSNode functions. It is a mutable object that is always on a certain syntax node, and can be moved imperatively to different nodes. i don't think i have been using the tree cursor api to navigate nodes (what took 200ms or more for clojure.core) -- may be i should try it out and take some measurments.

pez 2020-02-22T07:47:55.085200Z

Look, a cursor! Somewhat similar to the token cursor I use.

2020-02-22T08:34:17.085800Z

indeed 🙂