Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

instaparse

If you're not trampolining your parser, why bother getting up in the morning?

andrei 2016-08-30T20:25:59.000108Z

I am trying to write a simple grammar that parses comments: /* some text */, is there a way in instaparse to say any character? e.g.

"comment = ‘/*’ .* ‘*/‘"

aengelberg 2016-08-30T20:27:24.000109Z

@andrei Instaparse doesn't have a special character for that, but you can use regular expressions to cover any character

aengelberg 2016-08-30T20:28:04.000110Z

e.g. comment = '/*' #'[\\s\\S]'* '*/'

aengelberg 2016-08-30T20:29:07.000111Z

(`#"[\s\S]"` is my personal favorite way to match any character in a regex)

seylerius 2016-08-30T20:30:04.000112Z

@andrei: Yeah, you'll want something like this:

"comment = &lt;'/*'&gt; #'.*' &lt;'*/'&gt;"

My version hides the comment tokens, though @aengelberg's regexp might be more appropriate.

andrei 2016-08-30T20:30:43.000113Z

@aengelberg @seylerius thank you for the suggestions. I think I got a bit mislead by the source code, https://github.com/Engelberg/instaparse/blob/master/src/instaparse/abnf.clj#L19-L40 I thought there are some defaults in instaparse

andrei 2016-08-30T20:31:23.000115Z

but now reading through the doc strings, these are only to parse the grammar itself https://github.com/Engelberg/instaparse/blob/master/src/instaparse/abnf.clj#L2

aengelberg 2016-08-30T20:31:36.000117Z

a couple things I see in @seylerius's solution: 1) . in a regex doesn't include newlines 2) .* will greedily match past the */ and won't be able to parse the end of a comment

aengelberg 2016-08-30T20:32:07.000118Z

@andrei Sorry for the misleading code. Those constants are available but only to the ABNF format.

aengelberg 2016-08-30T20:32:37.000119Z

EBNF is the default

andrei 2016-08-30T20:33:09.000120Z

are there constants for ebnf? looking at the code I think not

seylerius 2016-08-30T20:33:14.000121Z

@andrei A point to keep in mind with @aengelberg's solution is that you'll need to condense the individual characters of the output.

andrei 2016-08-30T20:34:27.000122Z

@seylerius @aengelberg is there a way for specifying in instaparse to group matches together, s.t. one doesn’t need to condense the matches?

aengelberg 2016-08-30T20:34:27.000123Z

yeah, thanks for clarifying that @seylerius

seylerius 2016-08-30T20:34:56.000124Z

You'll get output like [:comment "f" "o" "o" " " "b" "a" "r"] from input like /*foo bar*/

andrei 2016-08-30T20:35:04.000125Z

exactly

andrei 2016-08-30T20:35:23.000126Z

there are ways to use transform and apply str on it

seylerius 2016-08-30T20:35:29.000127Z

Yep.

aengelberg 2016-08-30T20:35:40.000128Z

@andrei The official specification for ABNF is more strict and specific than EBNF, and it dictates that those constants are available. EBNF is more of an ambiguous mashup of a variety of standards we were able to find on the internet

👍 1

andrei 2016-08-30T20:35:44.000129Z

it just feels that there should be a grammar direct way

aengelberg 2016-08-30T20:36:06.000130Z

So there are no constants in EBNF, since none of the EBNF resources we found seemed to indicate such

seylerius 2016-08-30T20:36:15.000131Z

And remember to wrap your comment tokens in <> like I did, so you don't save the markup itself.

aengelberg 2016-08-30T20:36:31.000132Z

Sadly there is no grammar direct way to concat the strings

seylerius 2016-08-30T20:36:49.000133Z

Transform works pretty well, though.

andrei 2016-08-30T20:37:09.000134Z

hmm, or a more elaborated reg exp

andrei 2016-08-30T20:37:55.000135Z

I am using smth like this for strings

&lt;string&gt; = dqoute #'([^"\\]|\\.)*' dqoute
   &lt;dqoute&gt; = &lt;'\"'&gt;

seylerius 2016-08-30T20:38:13.000136Z

(insta/transform {:comment (partial apply str)} (comment-parser input-data))

andrei 2016-08-30T20:39:05.000137Z

and probably the performance impact is small if one applies transforms

seylerius 2016-08-30T20:39:47.000138Z

Lolyep. Far as I can tell, inataparse does a good job with efficient transforms.

aengelberg 2016-08-30T20:40:00.000139Z

it depends on the size of the file. Probably actually creating all those individual strings is going to be the bottleneck rather than concatenating them later

andrei 2016-08-30T20:40:06.000140Z

I must admit I was lead astray by regexps vs transforms which is more efficient - although I think its a very premature optimisation

aengelberg 2016-08-30T20:40:27.000141Z

A regex is a sensible solution if you can get it right 🙂

aengelberg 2016-08-30T20:41:00.000142Z

My first thought is to do a negative lookahead for */ as part of the regex

seylerius 2016-08-30T20:42:58.000143Z

Trouble is, from what I've found, that the */ will get eaten in the .*

seylerius 2016-08-30T20:43:19.000144Z

And the negative lookahead will pass because the end token was already eaten

andrei 2016-08-30T20:44:16.000145Z

so more reg exp magic for me to look into. to give a bit more context I am playing around with parsing localizable strings.

/* This is a comment */

"hello" = "Hello!";

/* This is another comment */
"click_button" = "Click";

/* Title bar, prints the number of selected products (The translation should be short due to the limit of 100 characters for the title of the mobile app) */
"bar_print_$_selected_products" = "You Selected %@ Products”;

andrei 2016-08-30T20:44:32.000146Z

just an experiment, nothing production related.

andrei 2016-08-30T20:47:20.000147Z

@aengelberg @seylerius thank you for your help, so far I enjoyed using instaparse. is cool that I can use some things that I learned in college to do some useful things

andrei 2016-08-30T20:47:54.000148Z

although I must say that I need to re-learn things about parsers and defining grammars

aengelberg 2016-08-30T20:48:11.000149Z

@seylerius I meant a regex negative lookahead, i.e. #".*(?!=/\*)" or something

aengelberg 2016-08-30T20:49:40.000150Z

@andrei glad you're having fun! feel free to ask here if you have any more questions

👍 1

seylerius 2016-08-30T20:49:59.000151Z

@aengelberg: That's what I thought. It winds up eating the end-token in the .* and passes the negative lookahead anyway. I was fighting that with the headline parser in organum over the weekend.

seylerius 2016-08-30T20:50:17.000152Z

When I was trying to get it to parse tags.

aengelberg 2016-08-30T20:50:59.000153Z

oh, I guess the regex would pass, saying "here's a sequence of characters (including /*), and look, there is not a /* *after* these characters!"

seylerius 2016-08-30T20:51:06.000154Z

Bingo

aengelberg 2016-08-30T20:51:33.000155Z

so maybe #"((?!/\*).)*"

aengelberg 2016-08-30T20:51:44.000156Z

that would generate a bunch of match groups though due to the ()

seylerius 2016-08-30T20:52:14.000157Z

Gah, lemme see what I did for that in the tags in organum.

seylerius 2016-08-30T20:53:54.000158Z

https://github.com/seylerius/organum/blob/master/src/organum/core.clj

seylerius 2016-08-30T20:54:18.000160Z

Yeah, ordered choice wound up featuring heavily.

seylerius 2016-08-30T20:55:56.000161Z

Maybe (<'*/'> / #'.')+?

seylerius 2016-08-30T20:56:21.000162Z

Always prefer to end a comment if possible, otherwise continue eating characters?

seylerius 2016-08-30T20:56:28.000163Z

Wait, not quite

seylerius 2016-08-30T20:56:37.000164Z

That'll continue past the end.

seylerius 2016-08-30T20:57:17.000165Z

Ach. I need to drive back to the store; I'm done with this client. Check in with y'all in about ten.

andrei 2016-08-30T21:03:07.000166Z

I will also catch up with you guys a bit later too or early tomorrow, its getting a bit late here in Berlin.

seylerius 2016-08-30T21:35:45.000167Z

Have a good one.