Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

instaparse

If you're not trampolining your parser, why bother getting up in the morning?

anthony.naddeo 2017-02-23T21:26:29.000205Z

@aengelberg I have a parser that I'm not sure how else to improve: https://github.com/naddeoa/elm-clojurescript-hello-parser/blob/master/src/elm_toolkit/parser.cljs

anthony.naddeo 2017-02-23T21:27:02.000207Z

I've eliminated a ton of ambiguity but I'm still seeing that its performance scales poorly with the size of the input, even when there are only 1 or two possible parses

anthony.naddeo 2017-02-23T21:27:23.000208Z

but that same input when fed chunk by chunk adds up to a much more reasonable parse time

anthony.naddeo 2017-02-23T21:28:22.000209Z

I was reading about the :optimize :memory option, but that seems to make it worse

anthony.naddeo 2017-02-23T21:28:41.000210Z

I even tried it with Java instead of Node to see if it was just a JS thing and the results were still similar

anthony.naddeo 2017-02-23T21:29:27.000211Z

My next step was about to be to write a function that breaks the input up into chunks, but to do that I'll need to know where a chunk starts/stops or impose some sort of convention around the input

2017-02-23T22:37:57.000212Z

have you seen https://github.com/Engelberg/instaparse/blob/master/docs/Performance.md? I feel like I have seen splitting input in to chunks recommended for larger inputs to instaparse

anthony.naddeo 2017-02-23T22:40:30.000214Z

Yeah that's where I got it from

anthony.naddeo 2017-02-23T22:40:47.000215Z

It just seems like something I shouldn't have to do

anthony.naddeo 2017-02-23T22:41:03.000216Z

If only because I need my parser to determine when a readable chunk starts/stops

anthony.naddeo 2017-02-23T22:41:10.000217Z

it isn't a line by line thing

2017-02-23T22:42:32.000218Z

I might try and eliminate regexes in the grammar definition, if I recall correctly those can be (or maybe they were?) inefficient

anthony.naddeo 2017-02-23T22:45:59.000219Z

Are there any examples you know of large grammars that scale linearly with the size of the input?

anthony.naddeo 2017-02-23T22:46:10.000220Z

Before I go down a rabbit hole I want to make sure I know what to expect

2017-02-23T22:48:59.000221Z

no, I don't

2017-02-23T22:49:44.000222Z

I don't know of any large grammars, and I have only used smallish grammars on small inputs

anthony.naddeo 2017-02-23T22:50:48.000223Z

I'm just afraid that the performance actually won't get better as hard as I may try and I want to cut my losses if I can.

anthony.naddeo 2017-02-23T22:50:57.000224Z

Or just use this for things that only require parsing snippets

anthony.naddeo 2017-02-23T22:51:07.000225Z

Thanks for the advice though @hiredman

2017-02-23T22:53:53.000226Z

it would be neat if instaparse could emit warnings and suggest fixes if your grammar is not LL(1)

anthony.naddeo 2017-02-23T22:57:06.000227Z

At this point, it seems like what I really want is a mode where it parses in chunks itself. My grammar is really just a repetition of 4 possible blocks. It seems like it could just parse the first block independent of the second. That is to say, as soon as it matches just consider it a block and move on.

anthony.naddeo 2017-02-23T22:57:25.000228Z

Given the right level of ambiguity I would be ok with massaging the results

aengelberg 2017-02-23T23:00:21.000229Z

How large are your problematic inputs?

anthony.naddeo 2017-02-23T23:02:30.000230Z

Well, it scales poorly. There isn't a size in specific. It parses the Elm programming langauge. When testing, I start with a single function, then I just keep duplicating that function and observe the performance

anthony.naddeo 2017-02-23T23:02:47.000231Z

each additional function adds more than its fair share of time

anthony.naddeo 2017-02-23T23:02:55.000232Z

I'll paste a snippet of something that takes too long

anthony.naddeo 2017-02-23T23:06:55.000233Z

(def input "nextSource : Source a -&gt; Source a
nextSource (Source a next) =
    Source (next a) next


nextSource : Source a -&gt; Source a
nextSource (Source a next) =
    Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
    Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
    Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
    Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
    Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
  Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
  Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
  Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
  Source (next a) next

nextSource : Source a -&gt; Source a
nextSource (Source a next) =
  Source (next a) next

")

anthony.naddeo 2017-02-23T23:08:15.000236Z

That one I actually just had to kill

anthony.naddeo 2017-02-23T23:08:28.000237Z

but if you cut it in half it only takes about a second or so to parse

anthony.naddeo 2017-02-23T23:08:44.000238Z

And its a pretty reasonable size for an Elm source file

aengelberg 2017-02-23T23:10:46.000239Z

Ouch

aengelberg 2017-02-23T23:11:14.000240Z

I'll try to take a look soon

anthony.naddeo 2017-02-23T23:18:52.000241Z

Cool, thanks a lot

anthony.naddeo 2017-02-23T23:19:19.000242Z

Also, definitely pretty new to clojure still. If I can make the code more approachable/replable feel free to point that out

anthony.naddeo 2017-02-23T23:19:24.000243Z

I kind of just hacked everything together

aengelberg 2017-02-23T23:36:53.000244Z

@anthony.naddeo I fixed it before:

start = (&lt;ws&gt;* block &lt;ws&gt;*)+

after:

start = (&lt;ws&gt; block &lt;ws&gt;)+

anthony.naddeo 2017-02-23T23:37:14.000245Z

That? Let me try now

aengelberg 2017-02-23T23:37:34.000246Z

the large input now takes 100ms on my machine

anthony.naddeo 2017-02-23T23:39:46.000247Z

100ms? I just tried it and it is WAY better, but I wish I had those numbers

anthony.naddeo 2017-02-23T23:39:52.000248Z

are you just in a node repl?

anthony.naddeo 2017-02-23T23:39:55.000249Z

I'm down to 900ms now

anthony.naddeo 2017-02-23T23:39:58.000250Z

which is great

anthony.naddeo 2017-02-23T23:40:02.000251Z

compared to never stopping

anthony.naddeo 2017-02-23T23:40:14.000252Z

I wouldn't have thought to do that

aengelberg 2017-02-23T23:40:26.000253Z

sorry I'm in JVM, not JS

aengelberg 2017-02-23T23:40:46.000254Z

900ms makes sense for node, instaparse in cljs is known to be ~ 10x slower

anthony.naddeo 2017-02-23T23:41:50.000255Z

oh interesting. Did you use my parser.clj file?

anthony.naddeo 2017-02-23T23:42:10.000256Z

yeah this is way better

anthony.naddeo 2017-02-23T23:42:17.000257Z

Thanks a ton

aengelberg 2017-02-23T23:43:07.000258Z

I just copied the grammar string from parser.cljs into my clojure file since the EBNF syntax is the same on both platforms

aengelberg 2017-02-23T23:43:17.000259Z

anthony.naddeo 2017-02-23T23:45:43.000261Z

One odd thing though

anthony.naddeo 2017-02-23T23:45:58.000262Z

I would expect that the * impacts performance because it adds ambiguity right?

anthony.naddeo 2017-02-23T23:46:29.000263Z

If so, shouldn't there be more than a single parse for it? (time (count (insta/parses parser/parser input :unhide :all))) returned 1 unless I did something wrong

2017-02-23T23:50:11.000264Z

ambiguity can mean multiple results, but it can also mean a single result that required checking lots of different cases to arrive at

anthony.naddeo 2017-02-23T23:50:42.000266Z

What metrics could people use to determine ambiguity if not parse counts?

2017-02-23T23:52:22.000267Z

#6 in that peformance doc suggests looking at rules individually to see if you have rules that in isolation can result in multiple parses

aengelberg 2017-02-23T23:52:24.000268Z

Instaparse de-duplicates results, so some internal ambiguity can lurk around but still impact the perf

anthony.naddeo 2017-02-23T23:53:04.000269Z

interesting

anthony.naddeo 2017-02-23T23:53:14.000270Z

well, I'm glad that was an easy fix, thanks guys

anthony.naddeo 2017-02-23T23:53:24.000271Z

I don't think I can go back to other parsers. I would have been heart broken

aengelberg 2017-02-23T23:55:11.000274Z

btw @anthony.naddeo, I made the following optimization to your Name and name which halved the time for me:

Name = #'(?!\\b(if|then|else|in|let|case|of)\\b)[A-Z][a-zA-Z0-9]*'
    name = #'(?!\\b(if|then|else|in|let|case|of|type)\\b)[a-z][a-zA-Z0-9]*'

aengelberg 2017-02-23T23:56:39.000275Z

Actually the first \b doesn't really do anything, because the first character of this particular token is the first char in the string from the regex's perspective

aengelberg 2017-02-23T23:57:43.000276Z

But my optimization was to shift more responsibility into the regex; that is always ideal