
If you're not trampolining your parser, why bother getting up in the morning?

What's the status of cljs support ?


Is it still living as a fork?

aengelberg 2016-07-04T01:57:56.000014Z

The only cljs support still lives in lbradstreet/instaparse-cljs

aengelberg 2016-07-04T01:59:57.000015Z

But I'm currently in the process of rewriting instaparse-cljs into a form that we'd be willing to accept back into upstream, now that cljsee exists

aengelberg 2016-07-04T07:46:31.000016Z

@seylerius: Here's a grammar that parses exponents like you were you asking:

boot.user=> (def p (insta/parser "
<S> = ows (exponent ows)+
<exponent> = token <'^'> super
super = token | <'{'> token <'}'>
<token> = #'[^\\s\\^{}]+'
<ows> = <#'\\s*'>
boot.user=> (p "foo^2 x^{x+1}")
("foo" [:super "2"] "x" [:super "x+1"])
This parser is pretty naive about the range of possible inputs, since I'm not totally sure myself what that range of inputs is in your use case.

seylerius 2016-07-04T16:43:30.000018Z


seylerius 2016-07-04T16:47:13.000020Z

Another question: * / + = & ~ can appear in singles without being tokens. How would you represent that? Current parser:

seylerius 2016-07-04T16:54:58.000021Z

@aengelberg: What I have will do for the moment, but it's a part of the spec I'd like to meet eventually.


Hi, We switched recently for parsing user input using plain regex to instaparse. Code looks way better. However there are two corner cases where I am not sure what would be idiomatic way: 1) parsing of certain domain of inputs should result on noop. Our current solution is:

"sentence = define / explain / help / catchall
<<skipped definitions>>
 catchall = #'(.|[\n\r])*'"
with an intention to just ignore last part during transformation : catchall (fn [_] nil) Now I wonder if there is another way to catch this case and ignore without using exceptions. 2)`'(.|[\n\r])*'` comes with | which on JVM leads on recursion and might result in stack overflow. In fact it happened one to us. Is there a better way to write catchall which would account for anything including \n and \r.

aengelberg 2016-07-04T17:10:05.000023Z

@happy.lisper for catchall you could do #'[\s\S]*'



aengelberg 2016-07-04T17:11:16.000025Z

So your use case is: "Parse the entire string as a define, an explain, or a help, but if that doesn't work then return nil"?

aengelberg 2016-07-04T17:11:43.000026Z

Because you could just run the parse and a transform, then check (insta/failure? result)


yes, where nil is just a signal to ignore the input.

aengelberg 2016-07-04T17:13:54.000028Z

(def p (insta/parser ...))
(let [result (p input-string)
      transformed (insta/transform p {...})]
  (when-not (insta/failure? transformed)

aengelberg 2016-07-04T17:14:12.000029Z

Note that insta/transform is specifically designed to pass through failures


Let me consider that 🙂.

aengelberg 2016-07-04T17:19:50.000031Z

@seylerius: Given an input ~a ~b, how do you know the a and b are to be parsed as individual ~'s, as opposed to a code string of "a " followed by "b"?

seylerius 2016-07-04T17:24:06.000034Z

@aengelberg: If I'm reading this correctly, the characters touching the inside of the tokens need to be alphanumeric, or at least non-whitespace.

aengelberg 2016-07-04T17:27:43.000035Z

so *a b c* shouldn't be allowed?

aengelberg 2016-07-04T17:28:24.000036Z

the current grammar that I suggested would allow that. Just trying to get a sense of the range of inputs so I can help design a parser accordingly

seylerius 2016-07-04T17:29:24.000037Z

*foo* *bar* ➡️ [:b "foo" "bar"] foo* bar* ➡️ "foo* bar*"

seylerius 2016-07-04T17:33:36.000038Z

@aengelberg: that make sense?

aengelberg 2016-07-04T17:34:50.000039Z

for the first example do you mean [:b "foo"] [:b "bar"]?

aengelberg 2016-07-04T17:37:16.000040Z

is there a guarantee that *a**b* won't happen?

seylerius 2016-07-04T17:38:46.000041Z

@aengelberg: Yes. And guarantee? No. Ambiguity in the spec we can lock to an interpretation? Yes.

seylerius 2016-07-04T17:45:17.000042Z

We basically get to decide if that's a pair of bold characters or a flat string we'll leave be.

seylerius 2016-07-04T17:45:28.000043Z

It would only likely happen as a typo.

seylerius 2016-07-04T17:45:41.000044Z

(Or a stupid user)

seylerius 2016-07-04T17:48:07.000045Z

@aengelberg: I'm basically upgrading organum. Sample org file:

aengelberg 2016-07-04T17:51:01.000046Z

hmm, thinking through how to enforce alphanumeric chars on the insides of tokens.

aengelberg 2016-07-04T17:52:22.000047Z

doing a "lookbehind" on the last * is nontrivial.

seylerius 2016-07-04T18:01:16.000048Z

What if I stripped leading and trailing whitespace before parsing, and modified the base string rule to start and end alphanumeric? Would that be easier?

seylerius 2016-07-04T18:05:37.000049Z

But, no, that wouldn't quite work.

seylerius 2016-07-04T18:11:29.000050Z

@aengelberg: Will the parser ignore escaped tokens, like \*?

seylerius 2016-07-04T18:12:48.000051Z

Ach. Clojure doesn't like \* in a string

seylerius 2016-07-04T18:30:43.000052Z

@aengelberg: Is here any way to mark tokens to not be parsed?


would angle brackets <> to hide parsed elements work?

aengelberg 2016-07-04T18:35:29.000054Z

@seylerius you'd have to do \\* if inside a Clojure string

aengelberg 2016-07-04T18:36:54.000055Z

the goal is to avoid parsing *a * as [:b "a "]

seylerius 2016-07-04T18:37:34.000056Z

@aengelberg: Anything special I have to do to mark that? I just tried parsing \\*foo\\* and got ("\\" [:b "foo\\"])

aengelberg 2016-07-04T18:38:22.000057Z

instaparse doesn't automatically handle backslashes in any special way besides what has been defined in your grammar.

seylerius 2016-07-04T18:41:42.000059Z

Okay. How do you define a simple backslash replacement in this type of grammar, then?

aengelberg 2016-07-04T18:45:59.000060Z

Maybe replace <string> with:

<string> = '\\\\*' | #'[^*/_+=~^_\\\\]+'
user> (inline-markup "a\\* b")
("a" "\\*" " b")

aengelberg 2016-07-04T18:46:17.000061Z

Pretty messy, I know. (four backslashes :face_with_rolling_eyes:)

aengelberg 2016-07-04T18:48:03.000062Z

I don't know if this solves your problem though; you don't want to escape *'s in every ** My Subsection text, do you?

aengelberg 2016-07-04T18:49:13.000063Z

sorry if I'm a bit unhelpful; phasing in and out of AFK

seylerius 2016-07-04T18:50:38.000064Z

I'm thinking I'm just going to tell users that if they want a plain * they have to escape it.

seylerius 2016-07-04T18:51:23.000065Z

Headlines are already handled by the time this stage of parsing is invoked, so those won't be an issue.

seylerius 2016-07-04T18:53:21.000066Z

And your special case of *a**b* is apparently already readily converted to ([:b "a"] [:b "b"])

seylerius 2016-07-04T20:11:06.000067Z

@aengelberg: Separate (earlier stage) parser: Is it possible (other than by having respective rules for #'^* ', #'^** ', #'^*** ', etc) to easily produce h1, h2, h3, etc?

seylerius 2016-07-04T20:20:25.000068Z

Actually, yeah. Just don't hide the token, and I can put that through a counter after the fact.