instaparse

If you're not trampolining your parser, why bother getting up in the morning?
micha 2015-08-07T00:59:59.000028Z

hello everyone, is there a grammar for clojure source that i can use in instaparse?

micha 2015-08-07T01:00:33.000029Z

or EDN would probably work in a pinch too

aengelberg 2015-08-07T01:04:43.000030Z

Not that I know of. Maybe this would help, you could port the ANTLR grammar into EBNF somehow http://stackoverflow.com/questions/3902813/is-there-a-language-spec-for-clojure

aengelberg 2015-08-07T01:06:49.000032Z

Although an EDN parser wouldn't be too hard to make from scratch. It's mostly balanced parens with various types of leafs or delimiters. The hard part is strings, handling all the \" stuff

aengelberg 2015-08-07T01:07:22.000033Z

And comments

micha 2015-08-07T01:07:31.000034Z

yeah i was thinking that also, really i just need curly braces, because i only want to parse maps

micha 2015-08-07T01:07:49.000036Z

but curlies can appear in strings

aengelberg 2015-08-07T01:07:51.000037Z

Maps with no lists / sets as keys or values?

micha 2015-08-07T01:08:35.000038Z

yeah the map can contain anything, but i can presumably slurp that in and just look for the matching curly

aengelberg 2015-08-07T01:08:56.000039Z

If you get the string terminal right, Instaparse will be smart about closing parens even if parens appear within the string

micha 2015-08-07T01:09:40.000040Z

how about indentation-based languages?

aengelberg 2015-08-07T01:09:47.000041Z

ehhhhhh

micha 2015-08-07T01:10:53.000042Z

haha yeah

aengelberg 2015-08-07T01:11:13.000043Z

"You're thinking in the wrong mindset" http://imgur.com/gallery/M5wl14r

micha 2015-08-07T01:12:12.000045Z

hahahaha excellent reference there

micha 2015-08-07T01:12:40.000046Z

the project i'm working on is a generalized, abstract sort of markdown

micha 2015-08-07T01:13:11.000047Z

it's designed to mix well with prose, so indentation based structure is a big win

aengelberg 2015-08-07T01:14:17.000048Z

So like, prose at the top level, important stuff indented?

micha 2015-08-07T01:17:02.000049Z

# This is line 1 of a certain type of block.
  The block continues here because of the indentation.

* This could be a list item

  p With a paragraph in it
    that continues on multiple lines...and
    has a strange #(inline something or other
    delimited by hash-parens)#...

  ~~~{:foo "bar", :baz 123} tags can also have
    attributes parsed as EDN...

* here is the next list item

micha 2015-08-07T01:18:12.000052Z

there we go

micha 2015-08-07T01:18:20.000053Z

the parser will be a macro really

micha 2015-08-07T01:18:27.000054Z

it will emit s-expressions

micha 2015-08-07T01:18:35.000055Z

calling multimethods

micha 2015-08-07T01:18:49.000056Z

so you can implement dispatches for any tags you like

micha 2015-08-07T01:19:10.000057Z

so what # foo means is up to you

micha 2015-08-07T01:19:57.000058Z

the indentation is crucial for making the thing general without special cases and hardcoded things

aengelberg 2015-08-07T01:20:33.000059Z

The reason "ehhhh" is the visceral response to indentation based langs in instaparse is because in CFGs it's difficult if not impossible to remember how many spaces / tabs you're looking for on each line.

aengelberg 2015-08-07T01:20:55.000060Z

So it's really only a problem if you have chunks within chunks that are indented even more.

micha 2015-08-07T01:21:20.000061Z

yeah, and i want to support even more tricky things, like indentation plus extra whitespace at the front of the line

micha 2015-08-07T01:21:37.000062Z

i have a naive handmade parser now to parse the blocks

micha 2015-08-07T01:21:52.000063Z

it looks for tags that can start a block

micha 2015-08-07T01:22:00.000064Z

then it looks for the "outdent"

micha 2015-08-07T01:22:11.000065Z

so it doesn't look for a specific amount of indentation

aengelberg 2015-08-07T01:22:27.000066Z

Example?

micha 2015-08-07T01:22:34.000067Z

it looks for a minimum amount of indentation, but you can use more

micha 2015-08-07T01:24:00.000068Z

# This is all
  part of the
    same block
  and the next
  block
    doesnt
      start until the
  an "outdent" is seen

This is an outdent, so
the above block will 
have been ended.

micha 2015-08-07T01:24:34.000069Z

however,

micha 2015-08-07T01:25:27.000070Z

# This is not
  all part of the same
  # block because this
    tag creates a nested
    block

aengelberg 2015-08-07T01:26:55.000071Z

Hmm, what if you parse each block and then run the parser AGAIN on the text in the block to find subblocks?

micha 2015-08-07T01:27:12.000072Z

hm

micha 2015-08-07T01:27:46.000073Z

i think i could give up the leading extra spaces thing, too

micha 2015-08-07T01:28:04.000074Z

and set indentation to some configurable fixed size

aengelberg 2015-08-07T01:28:24.000075Z

(my-parser text) => ([:block "This is not" "all part of the same" "# block because this" "   tag ..."])
(insta/transform *1 {:block (fn [& strs] (my-parser (str/join "\n" strs)))})

aengelberg 2015-08-07T01:28:50.000077Z

Except recursively smarter

micha 2015-08-07T01:28:59.000078Z

interesting

aengelberg 2015-08-07T01:29:11.000079Z

I just thought of this, it might end up being impractical.

aengelberg 2015-08-07T01:29:20.000080Z

But that might be the way to do it.

micha 2015-08-07T01:30:21.000081Z

i will play around with it and let you know how it works out

micha 2015-08-07T01:30:47.000082Z

i can at least instaparse the inline stuff, if not the blocks

aengelberg 2015-08-07T01:32:19.000083Z

True

aengelberg 2015-08-07T01:32:50.000084Z

Anyway, now that I think about it, this trick may be applicable to any indentation-based language

micha 2015-08-07T01:34:10.000085Z

it's also an interesting case because i need to parse "any character that isn't a tag"

micha 2015-08-07T01:34:21.000086Z

like the text in between tags

micha 2015-08-07T01:34:45.000087Z

i think i can use negative lookahead with regex like #"."

micha 2015-08-07T01:35:47.000088Z

anyway thanks for the help! i'll let you know how it all works out

aengelberg 2015-08-07T01:39:36.000089Z

No problem, I'd love to hear how it goes