Clojurians Log v2

Clojure programming

Channels

# 100-days-of-code # aatree # admin-announcements # adventofcode # ai # alda # aleph # all-the-channels # announcements # arachne # architecture # asami # atlanta-clojurians # atom-editor # autochrome-github # avi # aws # aws-lambda # babashka # babashka-sci-dev # bangalore-clj # beginners # berlin # biff # bigdata # bitcoin # boot # boot-dev # boulder-clojurians # braid-chat # braveandtrue # brevis # bristol-clojurians # business # calva # capetown # carry # cbus # cestmeetup # chestnut # chlorine-clover # cider # circleci # clara # clj-commons # cljdoc # cljfx # clj-http # clj-kondo # clj-on-windows # cljs-dev # cljs-experience # cljsfiddle # cljsjs # cljsrn # cljtogether # clojars # clojure # clojure-android # clojure-argentina # clojure-art # clojure-austin # clojure-australia # clojure-austria # clojure-bangladesh # clojure-bay-area # clojure-beijing # clojure-belgium # clojure-berlin # clojure-boston # clojure-brasil # clojurebridge # clojurebridge-ams # clojure-canada # clojure-chennai # clojure-chicago # clojure-china # clojure-colombia # clojure-conj # clojurecup # clojure-czech # clojured # clojure-denmark # clojure-denver # clojure-derby # clojuredesign-podcast # clojure-dev # clojure-dusseldorf # clojure-ecuador # clojure-egypt # clojure-estonia # clojure-europe # clojure-filipino # clojure-finland # clojure-france # clojure-gamedev # clojure-germany # clojure-greece # clojure-guangzhou # clojure-hamburg # clojure-hk # clojure-houston # clojure-hungary # clojure-india # clojureindia # clojure-indonesia # clojure-ireland # clojure-israel # clojure-italy # clojure-japan # clojure-kc # clojure-korea # clojure-losangeles # clojure-madison # clojure-mexico # clojure-miami # clojure-mk # clojure-mke # clojure-morsels # clojure-my # clojure-new-zealand # clojure-nl # clojure-nlp # clojure-norway # clojure-poland # clojure-portugal # clojure-provo # clojure-quebec # clojureremote # clojure-romania # clojure-russia # clojure-sanfrancisco # clojurescript # clojurescript-ios # clojure-sdn # clojure-seattle # clojure-serbia # clojure-sg # clojure-shanghai # clojure-spain # clojure-spec # clojuresque # clojure-survey # clojure-sweden # clojure-switzerland # clojure-taiwan # clojure-turkiye # clojure-uk # clojure-ukraine # clojureverse-ops # clojurewerkz # clojurewest # clojurex # clojure-za # clojurian-chat-app # clojutre # cloverage # cloxp # clr # code-art # code-reviews # community-development # component # conf-proposals # conjure # consulting # contributions-welcome # copenhagen-clojurians # core-async # core-logic # core-matrix # core-typed # cryogen # crypto # css # cursive # cz-clojure # d2q # datacrypt # datahike # datalevin # datalog # data-oriented-programming # data-science # datascript # datavis # dato # datomic # defnpodcast # deps-new # depstar # devcards # devops # dirac # docker # docs # domino-clj # duct # dunaj # eastwood # editors # emacs # error-message-catalog # etaoin # ethereum # euroclojure # events # exercism # expound # figwheel # figwheel-main # flambo # fulcro # funcool # functionalprogramming # funimage # garden # ghostwheel # girouette # gis # google-cloud # gorilla # graalvm # graalvm-mobile # graclj # graphql # gratitude # gsoc # hammock-driven-dev # helix # heroku # hispano # holy-lambda # honeysql # hoplon # hugsql # humor # hypercrud # hyperfiddle # immutant # improve-getting-started # incanter # indycljs # inf-clojure # instaparse # integrant # interceptors # interop # introduce-yourself # iot # iotivity # ipfs # jackdaw # jaunt # java # javascript # javelin # jobs # jobs-discuss # jobs-rus # joker # jukebox # juxt # jvm # kaocha # keechma # kekkonen # keyboards # klipse # kosmos # lambdaisland # ldnclj # ldnproclodo # lein-figwheel # leiningen # liberator # liquid # livestream # local-first-clojure # london-clojurians # lsp # luminus # lumo # mail # malli # mathematics # meander # melbourne # membrane # mental-health # microservices # mid-cities-meetup # midje # minecraft # minimallist # missionary # monads # mount # music # new-channels # new-clojure # nextjournal # nginx # nrepl # numerical-computing # nyc # observability # off-topic # om # om-next # onyx # other-languages # other-lisps # overtone # pamela # parinfer # pathom # pedestal # perun # philosophy # phzr # planck # plastic # play-clj # podcasts # polylith # portal # portkey # portland-or # powderkeg # practicalli # precept # prelude # programming-beginners # project-updates # proletarian # proton # protorepl # pulsar # pure-frame # qa # qlkit # quil # random # rdf # react # reactive # reading-clojure # reagent # reclojure # re-frame # reitit # releases # remote-jobs # respo # rethinkdb # reveal # rewrite-clj # ring # ring-swagger # robots # rum # schema # sci # sfcljs # shadow-cljs # _silence # sim-testing # sioux-falls # slack-help # sneer # sneer-br # spacemacs # specmonstah # specter # speculative # spirituality-ethics # sql # startup-in-a-month # sydney # test200 # test-check # testing # thejaloniki # timbre # tmp-json-parsing # tools-build # tools-deps # trading # tree-sitter # uncomplicate # unrepl # untangled # utah-clojurians # videos # vim # vrac # vscode # wasm # web-security # windows # xtdb # yada # yleinen

Apps

instaparse

If you're not trampolining your parser, why bother getting up in the morning?

aengelberg 2016-08-28T00:21:09.000024Z

@seylerius this is a good place for that.

aengelberg 2016-08-28T00:21:35.000025Z

You could put further "insta/parse"s in the functions inside the "insta/transform" map

seylerius 2016-08-28T00:21:48.000026Z

Wat

seylerius 2016-08-28T00:21:53.000027Z

This is awesome.

aengelberg 2016-08-28T00:23:05.000028Z

(insta/transform {:x (fn [s] (insta/parse otherparser s))} (insta/parse firstparser s)

aengelberg 2016-08-28T00:23:12.000029Z

Hard to bang out a good example on mobile

seylerius 2016-08-28T00:23:38.000030Z

Lolyep.

seylerius 2016-08-28T00:23:46.000032Z

That looks fascinating.

aengelberg 2016-08-28T00:24:12.000033Z

It would get weird if the nested parser had an error though.

seylerius 2016-08-28T00:24:19.000034Z

Yeah.

seylerius 2016-08-28T00:25:29.000035Z

So how deep does it go looking for :x?

seylerius 2016-08-28T00:26:09.000036Z

And how do you make it check for loose strings?

aengelberg 2016-08-28T00:26:57.000037Z

It does a full traversal of the hiccup / enlive, as long as all structures around the :x are valid hiccup / enlive

seylerius 2016-08-28T00:27:05.000038Z

Nice

seylerius 2016-08-28T00:59:14.000039Z

@aengelberg: How do you get solo strings?

seylerius 2016-08-28T21:01:19.000002Z

Gah, what's wrong with this parser? doc-metadata works fine, but running headlines on the remaining content just returns flat content. https://github.com/seylerius/organum

seylerius 2016-08-28T21:02:36.000004Z

@aengelberg: Got any clues?

seylerius 2016-08-28T21:03:57.000005Z

Simple reproduction: (headlines (last (doc-metadata (slurp "<http://sample.org|sample.org>"))))

seylerius 2016-08-28T21:05:08.000006Z

It's something in the h token, because that's the last thing I changed before it started failing.

ska 2016-08-28T21:10:45.000007Z

At a first glance, the #'.+' looks suspicious to me. Is greediness biting you here? (Did not try it out, though)

aengelberg 2016-08-28T21:25:00.000008Z

@seylerius the regex you put for :content is probably not what you want. Due to the (?s) flag, seems to match everything including newlines, as long as the first character is not a *.

aengelberg 2016-08-28T21:25:06.000009Z

I'm not sure what your desired behavior is though.

aengelberg 2016-08-28T21:26:16.000010Z

BTW, both the first ^ and the ? in your regex appear redundant, if I understand it correctly.

seylerius 2016-08-28T21:26:29.000011Z

The content regexp is fine. It's after I changed a few things to tidy up :h and added tag parsing that it started failing.

seylerius 2016-08-28T21:26:54.000012Z

Basically, a headline starts with some number of stars. Everything else isn't a headline.

aengelberg 2016-08-28T21:26:56.000013Z

I cloned your project and am looking at that parser. Is there a different version / branch I missed?

seylerius 2016-08-28T21:27:38.000014Z

Nope, I pushed the latest version just before I spoke up today.

aengelberg 2016-08-28T21:28:13.000015Z

Sorry I may have been unclear. When I said :content I meant the content inside the headlines parser.

aengelberg 2016-08-28T21:28:26.000016Z

Not the doc-metadata parser

aengelberg 2016-08-28T21:29:05.000017Z

As an experiment I removed all the hide-tags from the headlines parser, since I got that behavior you were talking about (flat content). That exposed the headlines' :content rule as being greedy.

aengelberg 2016-08-28T21:30:06.000018Z

organum.core&gt; (headlines content)
[:S [:token [:content "This is an attempt...

seylerius 2016-08-28T21:30:20.000019Z

Yep. I've got an ordered choice making it prefer to define a section (headline then content) if possible, and just content if not. The defining difference between content and headline is whether it starts with stars.

seylerius 2016-08-28T21:31:11.000020Z

Although, Hmmm. You've got a point about the mode there.

aengelberg 2016-08-28T21:31:58.000021Z

I think this is what happened: - The section rule failed at the start of the string - It then fell back to the content rule due to ordered choice - The content rule mistakenly parses the whole string (for the reason I mentioned above) - Parse is done

seylerius 2016-08-28T21:34:22.000023Z

Yeah. You're right. Making the content rule less accepting (not (?s)) fixes that part, and now I'm seeing failures to parse the first headline. Joy.

seylerius 2016-08-28T21:36:27.000024Z

How does inataparse play with non-capturing groups?

aengelberg 2016-08-28T21:38:09.000025Z

Not familiar with that term; are you referring to the groups returned by a Java regex match?

seylerius 2016-08-28T21:40:04.000026Z

Non-capturing groups are for saying, "this should be here, but don't return it in a group"

seylerius 2016-08-28T21:40:39.000027Z

Okay, new push. Can't manage to get tags out separate.

aengelberg 2016-08-28T21:41:20.000028Z

oh, you mean things like regex lookahead and lookbehind?

seylerius 2016-08-28T21:42:41.000029Z

They work if I make them mandatory, but get eaten by the headline body if they're optional. Would lookahead allow saying "if there's whitespace followed by a colon, stop here"?

aengelberg 2016-08-28T21:45:37.000030Z

This is the instaparse source code that applies regexes, may shed some light on whether certain constructs would work. https://github.com/Engelberg/instaparse/blob/master/src/instaparse/gll.clj#L670

aengelberg 2016-08-28T21:47:21.000032Z

I would expect regex non matching lookaheads to work, but non-matching lookbehinds to NOT work. Instaparse runs a regex match on the substring of the current index onward, so previous characters are invisible. EDIT: I misunderstood the term "non-matching"

aengelberg 2016-08-28T21:49:11.000033Z

I see you're using (?:) now. I don't think "non capturing" is what you want

seylerius 2016-08-28T21:49:22.000034Z

I think you're right.

aengelberg 2016-08-28T21:49:47.000035Z

organum.core&gt; (re-find #"a" "a")
"a"
organum.core&gt; (re-find #"(?:a)" "a")
"a"

seylerius 2016-08-28T21:49:56.000036Z

What's weird is non-greedy options fail entirely.

aengelberg 2016-08-28T21:50:18.000037Z

(?:) basically means, if there are any other groups () inside that block, DON'T return them as an additional output.

seylerius 2016-08-28T21:51:04.000038Z

Ah, it looks like negative lookahead is the trick.

aengelberg 2016-08-28T21:51:20.000039Z

(?!=)?

aengelberg 2016-08-28T21:51:43.000040Z

the ?: flag shouldn't affect Instaparse's usage of regexes at all. Instaparse throws away match groups

seylerius 2016-08-28T21:51:55.000041Z

(?!\\s+:)

aengelberg 2016-08-28T21:52:21.000042Z

seems legit

seylerius 2016-08-28T21:52:45.000043Z

Nope. Pushing. Still eats the tags.

aengelberg 2016-08-28T21:53:00.000045Z

hmm

seylerius 2016-08-28T21:53:16.000046Z

Pushed

aengelberg 2016-08-28T21:53:51.000047Z

need to run now, can probably help more in an hour or so. I'd say the next step is manually parsing the regexes on the strings.

aengelberg 2016-08-28T21:54:18.000048Z

and try gradually taking characters away from the regex to see what the problem is

seylerius 2016-08-28T21:54:23.000049Z

Okay, thanks for the help. Talk with ya when you've got time.

aengelberg 2016-08-28T21:54:36.000050Z

feel free to dump any further findings here

seylerius 2016-08-28T21:54:57.000051Z

Will do. Slack has persistence, which is pretty handy

seylerius 2016-08-28T23:34:43.000052Z

Okay, trying reluctance means I only get the first character of the headline, and the rest becomes part of the content.

seylerius 2016-08-28T23:34:59.000053Z

Trying lookahead seems to just fail.

seylerius 2016-08-28T23:42:06.000054Z

Okay, tags are mostly fixed, but it's only grabbing the first one.

seylerius 2016-08-28T23:42:11.000055Z

Pushed.

seylerius 2016-08-28T23:42:29.000056Z

Would appreciate a look when you have time, @aengelberg

seylerius 2016-08-28T23:43:30.000057Z

Ach. It's also not getting second headlines. They're turning into content lines due to newline weirdness.

seylerius 2016-08-28T23:46:49.000058Z

Pushed again. Fixed newline weirdness

seylerius 2016-08-28T23:50:17.000059Z

Hah, fixed it. Required post-tag newline/whitespace.

seylerius 2016-08-28T23:50:39.000060Z

Gah. Org is a beautiful format, but it's a bitch to parse.

aengelberg 2016-08-28T23:56:37.000061Z

The parser breaks if I put into the file

* The First : Section :foo:bar:

aengelberg 2016-08-28T23:56:43.000062Z

Not sure if that's valid org-mode.