
If you're not trampolining your parser, why bother getting up in the morning?

instaparse requires keywords for the names of the whatchamacallits?


I think I might be using instaparse in a weird enough way for that to be a very mild problem


because I have to gensym the names and so it's a memory leak

seylerius 2016-12-31T04:47:01.000022Z

@gfredericks It outputs either hiccup or enlive notation, so yes it probably would want keywords in reverse.

aengelberg 2016-12-31T09:52:28.000023Z


(def all-keywords-ever (map keyword (range)))

;; each time you dynamically create a parser
(let [my-syms ...
kws (zipmap my-syms all-keywords-ever)]

aengelberg 2016-12-31T09:52:40.000024Z

That might be a way to conserve on keywords

aengelberg 2016-12-31T09:55:21.000025Z

Or do a string replace in the grammar to substitute non terminals with reusable symbols, then postwalk the resulting tree to convert back


I'm using the combinators, so it shouldn't be too hard to do something like that if I decide this matters

zmaril 2016-12-31T19:41:39.000027Z

@gfredericks @aengelberg if we can actually get generating from grammars going I'd still be really stoked

zmaril 2016-12-31T19:42:36.000028Z

I've been working on the past few weeks and am getting within spitting distance of doing some fun stuff.

zmaril 2016-12-31T19:43:09.000030Z

It can basically parse C at this point and I'm working on finishing the macro preprocessor now.

zmaril 2016-12-31T19:45:17.000032Z

The goal is to get the output into datascript and queryable. But a side product of this is that if you have something that can generate strings from grammars then we already have something that can produce c programs (sans macros).


@zmaril do you or anybody know if all instaparse grammars are implemented using the combinators?



zmaril 2016-12-31T19:58:55.000035Z

Yes they should be

zmaril 2016-12-31T19:59:37.000036Z

My understanding is that the ebnf notation that everybody uses is actually parsed by a parser expressed in the combinators that transforms the output into combinators


I just glanced at the combinator list -- I think only the lookaheads are problematic, but that's probably a big deal for sophisticated parsers

zmaril 2016-12-31T20:00:24.000039Z



so...oh well.

zmaril 2016-12-31T20:00:56.000041Z

how does one express negation in generators now?


you could implement them with gen/such-that but the generator would fail if the lookahead condition is unlikely to pass by chance


I have no how that would play out IRL

zmaril 2016-12-31T20:01:43.000044Z

That should be fine then. For the parsers I write lookahead is typically used to implement reserved keywords.

zmaril 2016-12-31T20:02:03.000045Z

I've never used positive lookahead actually now that I think about it


when I made the regex→string generator I just decided not to support look[ahead|behind] for the same reason

zmaril 2016-12-31T20:02:38.000047Z

It's one of those things that is academic to me at this point

zmaril 2016-12-31T20:03:05.000048Z

I'm pretty sure that 99% gen/such-that of the time would be fine


it might not be too hard to throw together a PoC


in fact that would potentially be useful for what I'm working on right now

zmaril 2016-12-31T20:04:39.000052Z

yeah, I think that would fit really well and mirror what spec is doing

zmaril 2016-12-31T20:04:52.000053Z

I've been using spec/conform the same way I use instaparse and it works really well

zmaril 2016-12-31T20:05:12.000054Z

So I imagine we could use generators the same way spec does and it would work well (fingers crossed)


😂 I just realized that it would require using string-from-regex from test.chuck to support regexes in the grammars, and string-from-regex uses instaparse to parse the regex.

zmaril 2016-12-31T20:07:49.000056Z




zmaril 2016-12-31T20:08:10.000058Z

that was the thing that was holding me up actually

zmaril 2016-12-31T20:08:14.000059Z

was that I didn't want to mess with regexs

aengelberg 2016-12-31T20:09:30.000060Z

just catching up

aengelberg 2016-12-31T20:10:30.000061Z

After I wrote "instagenerate" I realized going the generator route (as opposed to core.logic) would probably be easier, despite the lookahead such-that problem

aengelberg 2016-12-31T20:10:41.000062Z

But what do you want to do about hide-tags?

zmaril 2016-12-31T20:11:10.000063Z

I think I have an idea, h/o

zmaril 2016-12-31T20:11:44.000064Z

well, hmmm what is the problem you see with hide-tags?

aengelberg 2016-12-31T20:12:14.000065Z

It depends on what you expect the "input" to the generator to be

aengelberg 2016-12-31T20:12:24.000066Z

a parse tree still?


it'd be the combinator


it would generate totally random parsable things


not based on same partial input

aengelberg 2016-12-31T20:13:24.000070Z

ok, in that case I don't really have a problem with hide tags despite just waking up

zmaril 2016-12-31T20:13:40.000071Z

I think if we got something going that just took a grammar and gave back random strings, that would be a good first step

aengelberg 2016-12-31T20:14:50.000072Z

part of why I did core.logic in instagenerate is @zmaril's initial request to go from partial input -> parseable strings, so I felt the need to put in the sophistication of logic programming as a general solver for all cases

zmaril 2016-12-31T20:15:15.000073Z

oh, if we want to do partial input, we can provide skeletons with places to start generating from

zmaril 2016-12-31T20:15:41.000074Z

then we just walk the skeleton and generate random strings at the indicated places

zmaril 2016-12-31T20:16:04.000075Z

still not fully general but better

zmaril 2016-12-31T20:17:19.000076Z

and then we could restrict the grammar inside the combinator somehow

aengelberg 2016-12-31T20:21:48.000078Z

(def p (insta/parser "
S = A B A | B A B
<A> ('a' <'c'> 'b')+
<B> ('b' 'a')+

(generate p [:S "a" "b" "b" "a" "a" "b"])
=> ("acbbaacb")

aengelberg 2016-12-31T20:23:35.000079Z

seems hard to performantly solve generally

zmaril 2016-12-31T20:24:34.000080Z

who said anything about performance

aengelberg 2016-12-31T20:24:39.000081Z

🙂 fair enough

aengelberg 2016-12-31T20:25:00.000082Z

but a generator approach using such-that may never complete on a large enough grammar

zmaril 2016-12-31T20:25:31.000083Z

cross that bridge when we get there

zmaril 2016-12-31T20:25:48.000084Z

computers are like really fast

zmaril 2016-12-31T20:26:20.000085Z

this is more of a what's possible idea than a production thing

aengelberg 2016-12-31T20:27:43.000086Z


aengelberg 2016-12-31T20:28:00.000087Z

let me know if I can help out in whichever path you decide to try out

zmaril 2016-12-31T20:28:28.000088Z

for sure!


yeah generators aren't generally for production stuff


I want a combinator that doesn't match anything


I thought maybe (combo/alt) but that returns ε

zmaril 2016-12-31T20:44:11.000092Z

(gen/such-that (constantly false)) or something?


a combinator, not a generator

zmaril 2016-12-31T20:44:22.000094Z

oh right sorry


I guess I can do negative lookahead with epsilon?

zmaril 2016-12-31T20:44:52.000096Z

or a really unlikely string?

zmaril 2016-12-31T20:45:27.000097Z




zmaril 2016-12-31T20:46:30.000099Z

we're not fancy here


(string (str (java.util.UUID/randomUUID)))

zmaril 2016-12-31T20:46:56.000101Z

that works!


I have an alternate thing in my codebase that could be called a parser, but instaparse also has something by that name so I called it a parsifier instead


and it's hard to remember that word because it could also have been parsinator

zmaril 2016-12-31T20:49:34.000104Z


zmaril 2016-12-31T20:50:17.000105Z

(defn enlive-output->datascript-datums [m]
 (if-not (map? m)
    {:type :value :value m}
    (as-> m $
        (assoc $ :meta (meta m))
        (assoc $ :db/id (d/tempid :mcc))
        (transform [:content ALL] enlive-output->datascript-datums $))))
This will take enlive output and make it so you can query it from datascript


does instaparse use its own regex engine?

zmaril 2016-12-31T20:53:37.000107Z



I just got a misparse where the thing matches the regex but instaparse disagrees

zmaril 2016-12-31T20:53:42.000109Z

depends on java if I recall


and reordering a disjunction in the regex fixes it

zmaril 2016-12-31T20:54:04.000111Z



this is the instparse-cljs thing in particular, but still on the jvm

zmaril 2016-12-31T20:54:16.000113Z

check if instaparse passes any flags in


here's the failing version:

zmaril 2016-12-31T20:58:11.000115Z


zmaril 2016-12-31T20:58:18.000116Z

"0/2" parses

zmaril 2016-12-31T20:58:44.000119Z

can you add in some parens to the second part to clarify your intent


"0/2" is not supposed to parse o_O


I see that's my fault though

zmaril 2016-12-31T21:03:49.000122Z


aengelberg 2016-12-31T21:59:41.000123Z

I second !epsilon as the "don't parse"

aengelberg 2016-12-31T22:00:29.000124Z

also instaparse fails on infinite loop grammars, so this might work

never-succeed = never-succeed
(then use never-succeed wherever)


@aengelberg do you think the current behavior of (combo/alt) is bad/weird?


my hunch is that According To Math it should either throw or not match anything

aengelberg 2016-12-31T22:03:01.000127Z

yeah I agree with your instinct. Not really sure what the thinking was in that design.


my argument is that because (combo/alt p) probably does not match ε, neither should (combo/alt)

aengelberg 2016-12-31T22:03:23.000129Z

Maybe since "don't parse anything" isn't really a common use case


you shouldn't parse more things by removing an arg from combo/alt

aengelberg 2016-12-31T22:03:46.000131Z



yeah I always end up finding the uncommon use cases


for a while every time I tried to use CLJS I ended up creating a jira ticket

aengelberg 2016-12-31T22:04:55.000134Z


aengelberg 2016-12-31T22:06:09.000135Z

I think I know why your parser is failing

aengelberg 2016-12-31T22:06:48.000136Z

The regex for the denominator, when given "25" as input, may arbitrarily decide to match either "2" or "25"

aengelberg 2016-12-31T22:07:04.000137Z

In instaparse, whatever the regex decides is the one and only possible parse

aengelberg 2016-12-31T22:07:53.000138Z

user=> (re-matches #"[2-9]|[1-9][0-9]+" "25")
user=> (re-seq #"[2-9]|[1-9][0-9]+" "25")
("2" "5")
user=> (re-find #"[2-9]|[1-9][0-9]+" "25")


oh it's about re-matches vs re-find?


oh I think I see

aengelberg 2016-12-31T22:09:04.000143Z

you could instead do #"[2-9]" | #"[1-9][0-9]+"

aengelberg 2016-12-31T22:09:22.000144Z

If you move logic from regexes into instaparse, you get flexibility at the cost of speed


so the fact that I fixed it by rearranging the regex is sort of an implementation detail I guess?

aengelberg 2016-12-31T22:12:02.000147Z

Yes, so I would call rearranging the regex an improper solution

aengelberg 2016-12-31T22:12:25.000148Z

but #"[2-9]" | #"[1-9][0-9]+" is proper


okay fine I'll switch it 😛