instaparse requires keywords for the names of the whatchamacallits?
I think I might be using instaparse in a weird enough way for that to be a very mild problem
because I have to gensym the names and so it's a memory leak
@gfredericks It outputs either hiccup or enlive notation, so yes it probably would want keywords in reverse.
(def all-keywords-ever (map keyword (range)))
;; each time you dynamically create a parser
(let [my-syms ...
kws (zipmap my-syms all-keywords-ever)]
...)
That might be a way to conserve on keywords
Or do a string replace in the grammar to substitute non terminals with reusable symbols, then postwalk the resulting tree to convert back
I'm using the combinators, so it shouldn't be too hard to do something like that if I decide this matters
@gfredericks @aengelberg if we can actually get generating from grammars going I'd still be really stoked
I've been working on https://github.com/zmaril/instaparse-c the past few weeks and am getting within spitting distance of doing some fun stuff.
It can basically parse C at this point and I'm working on finishing the macro preprocessor now.
The goal is to get the output into datascript and queryable. But a side product of this is that if you have something that can generate strings from grammars then we already have something that can produce c programs (sans macros).
@zmaril do you or anybody know if all instaparse grammars are implemented using the combinators?
s/grammars/parser/
Yes they should be
My understanding is that the ebnf notation that everybody uses is actually parsed by a parser expressed in the combinators that transforms the output into combinators
I just glanced at the combinator list -- I think only the lookaheads are problematic, but that's probably a big deal for sophisticated parsers
yep
so...oh well.
how does one express negation in generators now?
you could implement them with gen/such-that
but the generator would fail if the lookahead condition is unlikely to pass by chance
I have no how that would play out IRL
That should be fine then. For the parsers I write lookahead is typically used to implement reserved keywords.
I've never used positive lookahead actually now that I think about it
when I made the regex→string generator I just decided not to support look[ahead|behind] for the same reason
It's one of those things that is academic to me at this point
I'm pretty sure that 99% gen/such-that of the time would be fine
it might not be too hard to throw together a PoC
in fact that would potentially be useful for what I'm working on right now
yeah, I think that would fit really well and mirror what spec is doing
I've been using spec/conform the same way I use instaparse and it works really well
So I imagine we could use generators the same way spec does and it would work well (fingers crossed)
😂 I just realized that it would require using string-from-regex
from test.chuck to support regexes in the grammars, and string-from-regex
uses instaparse to parse the regex.
turtles
indeed
that was the thing that was holding me up actually
was that I didn't want to mess with regexs
just catching up
After I wrote "instagenerate" I realized going the generator route (as opposed to core.logic) would probably be easier, despite the lookahead such-that
problem
But what do you want to do about hide-tags?
I think I have an idea, h/o
well, hmmm what is the problem you see with hide-tags?
It depends on what you expect the "input" to the generator to be
a parse tree still?
it'd be the combinator
it would generate totally random parsable things
not based on same partial input
ok, in that case I don't really have a problem with hide tags despite just waking up
I think if we got something going that just took a grammar and gave back random strings, that would be a good first step
part of why I did core.logic in instagenerate is @zmaril's initial request to go from partial input -> parseable strings, so I felt the need to put in the sophistication of logic programming as a general solver for all cases
oh, if we want to do partial input, we can provide skeletons with places to start generating from
then we just walk the skeleton and generate random strings at the indicated places
still not fully general but better
and then we could restrict the grammar inside the combinator somehow
(def p (insta/parser "
S = A B A | B A B
<A> ('a' <'c'> 'b')+
<B> ('b' 'a')+
"))
(generate p [:S "a" "b" "b" "a" "a" "b"])
=> ("acbbaacb")
seems hard to performantly solve generally
who said anything about performance
🙂 fair enough
but a generator approach using such-that
may never complete on a large enough grammar
cross that bridge when we get there
computers are like really fast
this is more of a what's possible idea than a production thing
cool
let me know if I can help out in whichever path you decide to try out
for sure!
yeah generators aren't generally for production stuff
I want a combinator that doesn't match anything
I thought maybe (combo/alt)
but that returns ε
(gen/such-that (constantly false)) or something?
a combinator, not a generator
oh right sorry
I guess I can do negative lookahead with epsilon?
or a really unlikely string?
like (string "THISWILLNEVERBEMATCHEDHOPEFULLY")
🙂
we're not fancy here
(string (str (java.util.UUID/randomUUID)))
that works!
I have an alternate thing in my codebase that could be called a parser, but instaparse also has something by that name so I called it a parsifier instead
and it's hard to remember that word because it could also have been parsinator
hahaha
(defn enlive-output->datascript-datums [m]
(if-not (map? m)
{:type :value :value m}
(as-> m $
(assoc $ :meta (meta m))
(assoc $ :db/id (d/tempid :mcc))
(transform [:content ALL] enlive-output->datascript-datums $))))
This will take enlive output and make it so you can query it from datascriptdoes instaparse use its own regex engine?
no
I just got a misparse where the thing matches the regex but instaparse disagrees
depends on java if I recall
and reordering a disjunction in the regex fixes it
hmm
this is the instparse-cljs thing in particular, but still on the jvm
check if instaparse passes any flags in
here's the failing version: https://www.refheap.com/124435
hmm
"0/2"
parses
can you add in some parens to the second part to clarify your intent
"0/2" is not supposed to parse o_O
I see that's my fault though
ha
I second !epsilon
as the "don't parse"
also instaparse fails on infinite loop grammars, so this might work
never-succeed = never-succeed
(then use never-succeed
wherever)@aengelberg do you think the current behavior of (combo/alt)
is bad/weird?
my hunch is that According To Math it should either throw or not match anything
yeah I agree with your instinct. Not really sure what the thinking was in that design.
my argument is that because (combo/alt p)
probably does not match ε, neither should (combo/alt)
Maybe since "don't parse anything" isn't really a common use case
you shouldn't parse more things by removing an arg from combo/alt
agreed
yeah I always end up finding the uncommon use cases
for a while every time I tried to use CLJS I ended up creating a jira ticket
#gobigorgohome
I think I know why your parser is failing
The regex for the denominator, when given "25"
as input, may arbitrarily decide to match either "2"
or "25"
In instaparse, whatever the regex decides is the one and only possible parse
user=> (re-matches #"[2-9]|[1-9][0-9]+" "25")
"25"
user=> (re-seq #"[2-9]|[1-9][0-9]+" "25")
("2" "5")
user=> (re-find #"[2-9]|[1-9][0-9]+" "25")
"2"
oh it's about re-matches
vs re-find
?
https://github.com/engelberg/instaparse#regular-expressions-a-word-of-warning
oh I think I see
you could instead do #"[2-9]" | #"[1-9][0-9]+"
If you move logic from regexes into instaparse, you get flexibility at the cost of speed
so the fact that I fixed it by rearranging the regex is sort of an implementation detail I guess?
Yes, so I would call rearranging the regex an improper solution
but #"[2-9]" | #"[1-9][0-9]+"
is proper
okay fine I'll switch it 😛