hello everyone, is there a grammar for clojure source that i can use in instaparse?
or EDN would probably work in a pinch too
Not that I know of. Maybe this would help, you could port the ANTLR grammar into EBNF somehow http://stackoverflow.com/questions/3902813/is-there-a-language-spec-for-clojure
Although an EDN parser wouldn't be too hard to make from scratch. It's mostly balanced parens with various types of leafs or delimiters. The hard part is strings, handling all the \"
stuff
And comments
yeah i was thinking that also, really i just need curly braces, because i only want to parse maps
but curlies can appear in strings
Maps with no lists / sets as keys or values?
yeah the map can contain anything, but i can presumably slurp that in and just look for the matching curly
If you get the string terminal right, Instaparse will be smart about closing parens even if parens appear within the string
how about indentation-based languages?
ehhhhhh
haha yeah
"You're thinking in the wrong mindset" http://imgur.com/gallery/M5wl14r
hahahaha excellent reference there
the project i'm working on is a generalized, abstract sort of markdown
it's designed to mix well with prose, so indentation based structure is a big win
So like, prose at the top level, important stuff indented?
# This is line 1 of a certain type of block.
The block continues here because of the indentation.
* This could be a list item
p With a paragraph in it
that continues on multiple lines...and
has a strange #(inline something or other
delimited by hash-parens)#...
~~~{:foo "bar", :baz 123} tags can also have
attributes parsed as EDN...
* here is the next list item
there we go
the parser will be a macro really
it will emit s-expressions
calling multimethods
so you can implement dispatches for any tags you like
so what # foo
means is up to you
the indentation is crucial for making the thing general without special cases and hardcoded things
The reason "ehhhh" is the visceral response to indentation based langs in instaparse is because in CFGs it's difficult if not impossible to remember how many spaces / tabs you're looking for on each line.
So it's really only a problem if you have chunks within chunks that are indented even more.
yeah, and i want to support even more tricky things, like indentation plus extra whitespace at the front of the line
i have a naive handmade parser now to parse the blocks
it looks for tags that can start a block
then it looks for the "outdent"
so it doesn't look for a specific amount of indentation
Example?
it looks for a minimum amount of indentation, but you can use more
# This is all
part of the
same block
and the next
block
doesnt
start until the
an "outdent" is seen
This is an outdent, so
the above block will
have been ended.
however,
# This is not
all part of the same
# block because this
tag creates a nested
block
Hmm, what if you parse each block and then run the parser AGAIN on the text in the block to find subblocks?
hm
i think i could give up the leading extra spaces thing, too
and set indentation to some configurable fixed size
(my-parser text) => ([:block "This is not" "all part of the same" "# block because this" " tag ..."])
(insta/transform *1 {:block (fn [& strs] (my-parser (str/join "\n" strs)))})
Except recursively smarter
interesting
I just thought of this, it might end up being impractical.
But that might be the way to do it.
i will play around with it and let you know how it works out
i can at least instaparse the inline stuff, if not the blocks
True
Anyway, now that I think about it, this trick may be applicable to any indentation-based language
it's also an interesting case because i need to parse "any character that isn't a tag"
like the text in between tags
i think i can use negative lookahead with regex like #"."
anyway thanks for the help! i'll let you know how it all works out
No problem, I'd love to hear how it goes