Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

instaparse

If you're not trampolining your parser, why bother getting up in the morning?

2016-12-31T03:00:27.000019Z

instaparse requires keywords for the names of the whatchamacallits?

2016-12-31T03:00:55.000020Z

I think I might be using instaparse in a weird enough way for that to be a very mild problem

2016-12-31T03:01:13.000021Z

because I have to gensym the names and so it's a memory leak

seylerius 2016-12-31T04:47:01.000022Z

@gfredericks It outputs either hiccup or enlive notation, so yes it probably would want keywords in reverse.

aengelberg 2016-12-31T09:52:28.000023Z

@gfredericks:

(def all-keywords-ever (map keyword (range)))

;; each time you dynamically create a parser
(let [my-syms ...
kws (zipmap my-syms all-keywords-ever)]
...)

aengelberg 2016-12-31T09:52:40.000024Z

That might be a way to conserve on keywords

aengelberg 2016-12-31T09:55:21.000025Z

Or do a string replace in the grammar to substitute non terminals with reusable symbols, then postwalk the resulting tree to convert back

2016-12-31T14:24:50.000026Z

I'm using the combinators, so it shouldn't be too hard to do something like that if I decide this matters

zmaril 2016-12-31T19:41:39.000027Z

@gfredericks @aengelberg if we can actually get generating from grammars going I'd still be really stoked

zmaril 2016-12-31T19:42:36.000028Z

I've been working on https://github.com/zmaril/instaparse-c the past few weeks and am getting within spitting distance of doing some fun stuff.

zmaril 2016-12-31T19:43:09.000030Z

It can basically parse C at this point and I'm working on finishing the macro preprocessor now.

zmaril 2016-12-31T19:45:17.000032Z

The goal is to get the output into datascript and queryable. But a side product of this is that if you have something that can generate strings from grammars then we already have something that can produce c programs (sans macros).

2016-12-31T19:58:30.000033Z

@zmaril do you or anybody know if all instaparse grammars are implemented using the combinators?

2016-12-31T19:58:39.000034Z

s/grammars/parser/

zmaril 2016-12-31T19:58:55.000035Z

Yes they should be

zmaril 2016-12-31T19:59:37.000036Z

My understanding is that the ebnf notation that everybody uses is actually parsed by a parser expressed in the combinators that transforms the output into combinators

2016-12-31T20:00:11.000038Z

I just glanced at the combinator list -- I think only the lookaheads are problematic, but that's probably a big deal for sophisticated parsers

zmaril 2016-12-31T20:00:24.000039Z

yep

2016-12-31T20:00:29.000040Z

so...oh well.

zmaril 2016-12-31T20:00:56.000041Z

how does one express negation in generators now?

2016-12-31T20:01:09.000042Z

you could implement them with gen/such-that but the generator would fail if the lookahead condition is unlikely to pass by chance

2016-12-31T20:01:37.000043Z

I have no how that would play out IRL

zmaril 2016-12-31T20:01:43.000044Z

That should be fine then. For the parsers I write lookahead is typically used to implement reserved keywords.

zmaril 2016-12-31T20:02:03.000045Z

I've never used positive lookahead actually now that I think about it

2016-12-31T20:02:18.000046Z

when I made the regex→string generator I just decided not to support look[ahead|behind] for the same reason

zmaril 2016-12-31T20:02:38.000047Z

It's one of those things that is academic to me at this point

zmaril 2016-12-31T20:03:05.000048Z

I'm pretty sure that 99% gen/such-that of the time would be fine

2016-12-31T20:03:29.000050Z

it might not be too hard to throw together a PoC

2016-12-31T20:03:42.000051Z

in fact that would potentially be useful for what I'm working on right now

zmaril 2016-12-31T20:04:39.000052Z

yeah, I think that would fit really well and mirror what spec is doing

zmaril 2016-12-31T20:04:52.000053Z

I've been using spec/conform the same way I use instaparse and it works really well

zmaril 2016-12-31T20:05:12.000054Z

So I imagine we could use generators the same way spec does and it would work well (fingers crossed)

2016-12-31T20:07:32.000055Z

😂 I just realized that it would require using string-from-regex from test.chuck to support regexes in the grammars, and string-from-regex uses instaparse to parse the regex.

zmaril 2016-12-31T20:07:49.000056Z

turtles

2016-12-31T20:08:00.000057Z

indeed

zmaril 2016-12-31T20:08:10.000058Z

that was the thing that was holding me up actually

zmaril 2016-12-31T20:08:14.000059Z

was that I didn't want to mess with regexs

aengelberg 2016-12-31T20:09:30.000060Z

just catching up

aengelberg 2016-12-31T20:10:30.000061Z

After I wrote "instagenerate" I realized going the generator route (as opposed to core.logic) would probably be easier, despite the lookahead such-that problem

aengelberg 2016-12-31T20:10:41.000062Z

But what do you want to do about hide-tags?

zmaril 2016-12-31T20:11:10.000063Z

I think I have an idea, h/o

zmaril 2016-12-31T20:11:44.000064Z

well, hmmm what is the problem you see with hide-tags?

aengelberg 2016-12-31T20:12:14.000065Z

It depends on what you expect the "input" to the generator to be

aengelberg 2016-12-31T20:12:24.000066Z

a parse tree still?

2016-12-31T20:12:34.000067Z

it'd be the combinator

2016-12-31T20:12:46.000068Z

it would generate totally random parsable things

2016-12-31T20:12:53.000069Z

not based on same partial input

aengelberg 2016-12-31T20:13:24.000070Z

ok, in that case I don't really have a problem with hide tags despite just waking up

zmaril 2016-12-31T20:13:40.000071Z

I think if we got something going that just took a grammar and gave back random strings, that would be a good first step

aengelberg 2016-12-31T20:14:50.000072Z

part of why I did core.logic in instagenerate is @zmaril's initial request to go from partial input -> parseable strings, so I felt the need to put in the sophistication of logic programming as a general solver for all cases

zmaril 2016-12-31T20:15:15.000073Z

oh, if we want to do partial input, we can provide skeletons with places to start generating from

zmaril 2016-12-31T20:15:41.000074Z

then we just walk the skeleton and generate random strings at the indicated places

zmaril 2016-12-31T20:16:04.000075Z

still not fully general but better

zmaril 2016-12-31T20:17:19.000076Z

and then we could restrict the grammar inside the combinator somehow

aengelberg 2016-12-31T20:21:48.000078Z

(def p (insta/parser "
S = A B A | B A B
&lt;A&gt; ('a' &lt;'c'&gt; 'b')+
&lt;B&gt; ('b' 'a')+
"))

(generate p [:S "a" "b" "b" "a" "a" "b"])
=&gt; ("acbbaacb")

aengelberg 2016-12-31T20:23:35.000079Z

seems hard to performantly solve generally

zmaril 2016-12-31T20:24:34.000080Z

who said anything about performance

aengelberg 2016-12-31T20:24:39.000081Z

🙂 fair enough

aengelberg 2016-12-31T20:25:00.000082Z

but a generator approach using such-that may never complete on a large enough grammar

zmaril 2016-12-31T20:25:31.000083Z

cross that bridge when we get there

zmaril 2016-12-31T20:25:48.000084Z

computers are like really fast

zmaril 2016-12-31T20:26:20.000085Z

this is more of a what's possible idea than a production thing

aengelberg 2016-12-31T20:27:43.000086Z

cool

aengelberg 2016-12-31T20:28:00.000087Z

let me know if I can help out in whichever path you decide to try out

zmaril 2016-12-31T20:28:28.000088Z

for sure!

2016-12-31T20:38:51.000089Z

yeah generators aren't generally for production stuff

2016-12-31T20:43:20.000090Z

I want a combinator that doesn't match anything

2016-12-31T20:43:41.000091Z

I thought maybe (combo/alt) but that returns ε

zmaril 2016-12-31T20:44:11.000092Z

(gen/such-that (constantly false)) or something?

2016-12-31T20:44:18.000093Z

a combinator, not a generator

zmaril 2016-12-31T20:44:22.000094Z

oh right sorry

2016-12-31T20:44:38.000095Z

I guess I can do negative lookahead with epsilon?

zmaril 2016-12-31T20:44:52.000096Z

or a really unlikely string?

zmaril 2016-12-31T20:45:27.000097Z

like (string "THISWILLNEVERBEMATCHEDHOPEFULLY")

2016-12-31T20:46:12.000098Z

🙂

zmaril 2016-12-31T20:46:30.000099Z

we're not fancy here

2016-12-31T20:46:38.000100Z

(string (str (java.util.UUID/randomUUID)))

zmaril 2016-12-31T20:46:56.000101Z

that works!

2016-12-31T20:48:24.000102Z

I have an alternate thing in my codebase that could be called a parser, but instaparse also has something by that name so I called it a parsifier instead

2016-12-31T20:48:32.000103Z

and it's hard to remember that word because it could also have been parsinator

zmaril 2016-12-31T20:49:34.000104Z

hahaha

zmaril 2016-12-31T20:50:17.000105Z

(defn enlive-output-&gt;datascript-datums [m]
 (if-not (map? m)
    {:type :value :value m}
    (as-&gt; m $
        (assoc $ :meta (meta m))
        (assoc $ :db/id (d/tempid :mcc))
        (transform [:content ALL] enlive-output-&gt;datascript-datums $))))

This will take enlive output and make it so you can query it from datascript

2016-12-31T20:53:24.000106Z

does instaparse use its own regex engine?

zmaril 2016-12-31T20:53:37.000107Z

2016-12-31T20:53:40.000108Z

I just got a misparse where the thing matches the regex but instaparse disagrees

zmaril 2016-12-31T20:53:42.000109Z

depends on java if I recall

2016-12-31T20:53:52.000110Z

and reordering a disjunction in the regex fixes it

zmaril 2016-12-31T20:54:04.000111Z

hmm

2016-12-31T20:54:10.000112Z

this is the instparse-cljs thing in particular, but still on the jvm

zmaril 2016-12-31T20:54:16.000113Z

check if instaparse passes any flags in

2016-12-31T20:55:53.000114Z

here's the failing version: https://www.refheap.com/124435

zmaril 2016-12-31T20:58:11.000115Z

hmm

zmaril 2016-12-31T20:58:18.000116Z

"0/2" parses

zmaril 2016-12-31T20:58:44.000119Z

can you add in some parens to the second part to clarify your intent

2016-12-31T20:59:59.000120Z

"0/2" is not supposed to parse o_O

2016-12-31T21:00:34.000121Z

I see that's my fault though

zmaril 2016-12-31T21:03:49.000122Z

aengelberg 2016-12-31T21:59:41.000123Z

I second !epsilon as the "don't parse"

aengelberg 2016-12-31T22:00:29.000124Z

also instaparse fails on infinite loop grammars, so this might work

never-succeed = never-succeed

(then use never-succeed wherever)

2016-12-31T22:01:58.000125Z

@aengelberg do you think the current behavior of (combo/alt) is bad/weird?

2016-12-31T22:02:44.000126Z

my hunch is that According To Math it should either throw or not match anything

aengelberg 2016-12-31T22:03:01.000127Z

yeah I agree with your instinct. Not really sure what the thinking was in that design.

2016-12-31T22:03:17.000128Z

my argument is that because (combo/alt p) probably does not match ε, neither should (combo/alt)

aengelberg 2016-12-31T22:03:23.000129Z

Maybe since "don't parse anything" isn't really a common use case

2016-12-31T22:03:33.000130Z

you shouldn't parse more things by removing an arg from combo/alt

aengelberg 2016-12-31T22:03:46.000131Z

agreed

2016-12-31T22:04:03.000132Z

yeah I always end up finding the uncommon use cases

2016-12-31T22:04:25.000133Z

for a while every time I tried to use CLJS I ended up creating a jira ticket

aengelberg 2016-12-31T22:04:55.000134Z

#gobigorgohome

aengelberg 2016-12-31T22:06:09.000135Z

I think I know why your parser is failing

aengelberg 2016-12-31T22:06:48.000136Z

The regex for the denominator, when given "25" as input, may arbitrarily decide to match either "2" or "25"

aengelberg 2016-12-31T22:07:04.000137Z

In instaparse, whatever the regex decides is the one and only possible parse

aengelberg 2016-12-31T22:07:53.000138Z

user=&gt; (re-matches #"[2-9]|[1-9][0-9]+" "25")
"25"
user=&gt; (re-seq #"[2-9]|[1-9][0-9]+" "25")
("2" "5")
user=&gt; (re-find #"[2-9]|[1-9][0-9]+" "25")
"2"

2016-12-31T22:08:33.000139Z

oh it's about re-matches vs re-find?

aengelberg 2016-12-31T22:08:39.000140Z

https://github.com/engelberg/instaparse#regular-expressions-a-word-of-warning

2016-12-31T22:08:47.000142Z

oh I think I see

aengelberg 2016-12-31T22:09:04.000143Z

you could instead do #"[2-9]" | #"[1-9][0-9]+"

aengelberg 2016-12-31T22:09:22.000144Z

If you move logic from regexes into instaparse, you get flexibility at the cost of speed

2016-12-31T22:11:10.000146Z

so the fact that I fixed it by rearranging the regex is sort of an implementation detail I guess?

aengelberg 2016-12-31T22:12:02.000147Z

Yes, so I would call rearranging the regex an improper solution

aengelberg 2016-12-31T22:12:25.000148Z

but #"[2-9]" | #"[1-9][0-9]+" is proper

2016-12-31T22:15:10.000149Z

okay fine I'll switch it 😛