tree-sitter

tree-sitter fun
2020-03-08T00:47:25.035Z

trying out a recent version of a clojure grammar in atom

2020-03-08T00:48:57.035700Z

(fwiw, the rainbow parens are done by something else)

2020-03-08T00:49:43.036100Z

also am trying a grammar split into two pieces

2020-03-08T00:50:05.036700Z

there is a base grammar that provides things like symbol, keyword, number, and other "primitive" things.

2020-03-08T00:50:27.037200Z

there is a grammar which "inherits" from the base and it provides "var_definition"

2020-03-08T00:50:55.037500Z

so that would be things that look like (def a 1)

2020-03-08T00:53:21.038800Z

part of the file for tweaking highlighting in atom looks like:

comments:
  start: ';'

scopes:
  'source_file': 'source.clojure'

  'nil': 'constant.language.null'
  'boolean > "true"': 'constant.language.boolean.true'
  'boolean > "false"': 'constant.language.boolean.false'
  'comment': 'comment.line'
  'number': 'constant.numeric'

  'keyword': 'constant.keyword'
  'string': 'string.quoted'
  'regular_expression': 'string.regexp'
  'character': 'constant.character'

  'list > symbol': [
    {
      match: '^(&|%|<|=|>|\\+|\\-|/|\\*|->|->>|alias|and|assert|case|catch|cond|declare|defn|defn-|do|fn|if|if-let|let|loop|not|ns|or|recur|require|some->|throw|try|when|when-let)$'
      scopes: 'keyword.control'
    }
  ]

  # XXX: does def
  'var_definition > "def"': 'keyword.control'
(https://flight-manual.atom.io/hacking-atom/sections/creating-a-grammar/)

2020-03-08T10:23:18.041200Z

trying out another grammar idea that removes the necessity for externals -- have put ^, #^, #', @, backtick, and ' into a _sigils grouping and instead of trying to make them extras, have made zero or more of them be optional in front of literals. clojure.core seems to have parsed ok and back to 20ms 🙂 about to run the large sample tests.

2020-03-08T10:23:50.041500Z

(this was not motivated by performance, fwiw 😉 )

pez 2020-03-08T10:26:40.042Z

What would @ do in front of a literal?

pez 2020-03-08T10:28:05.042800Z

I'm asking because I have the following note in my lexer (non-tree-sitter, as you know):

// prefixing patterns - TODO: revisit these and see if we can always use the same
// opens ((?<!\p{Ll})['`~#@?^]\s*)*
// id    (['`~#^@]\s*)*
// lit   (['`~#]\s*)*
// kw    (['`~^]\s*)*

2020-03-08T10:42:25.047200Z

first, the definition of a literal here may likely be different from yours. second, in tree-sitter afaik we cannot express the idea of "don't apply this rule in situation x" directly. we have to some how provide some other thing to parse as via precendence or some other technique. consequently, to handle certain things, it appears to be better sometimes to be much more lax in what is allowed. with that said, the places i have identified so far where @ can appear are in front of symbols, but also in front of something that returns an IRef:

@(atom [1 2])
in the grammar here, lists are identified as literals.

2020-03-08T10:42:58.047500Z

does that make sense?

2020-03-08T10:43:38.048300Z

btw, the sigil idea was thought of by my better half, but we were inspired by studying your tokenizer (which i attempted but failed at porting yesterday to cljs 🙂 )

pez 2020-03-08T10:49:25.049800Z

Oh, please let me know if I you have any progress in porting it to cljs, or if I can help in some way. I'd love to move it into the cljs parts of Calva.

pez 2020-03-08T10:50:41.050700Z

Regarding my question it seems to be answered by: > in the grammar here, lists are identified as literals 😃

2020-03-08T11:00:59.056Z

re: porting to cljs - i might wait to try again - what i found out yesterday is that doing regex stuff in cljs for working with a clojure grammar is pretty challenging. e.g. some rules involve the hash, but i like using the (?x) free-spacing mode to express my regular expressions for readability (using the xregexp library, for example). however, i didn't find a way to include # in a free-spacing mode regex -- it gets interpreted as a the beginning of a comment. i also tried lambdaisland's new regal library, but it doesn't yet support look ahead or behind assertions, at least one of which was needed to express at least one of the patterns. may be you've tried it already? i did finally manage to express the patterns but only by using variables containing strings but to make things readable i used odd spacing (which is very fragile). so i'm thinking to wait for regal to develop a bit more before trying again. my initial target is only lexer.ts and clojure-lexer.ts though...

pez 2020-03-08T11:04:51.057900Z

I wish I had time to help out some with regal. I haven't even tried it yet, but it is such a great initiative. Reminds me how much I miss Perl's readable regex variants.

👍 1
🐫 1
2020-03-08T11:37:22.059200Z

thanks for asking about the literals, btw -- that added additional motivation to rename and reorganize. it should be less confusing now :)

😀 1