That part of the abnf namespace seems not properly designed for supplementary characters altogether...
It isn't using any utility (unlike the clj version) to turn an oversized integer into a series of two characters.
The get-char-combinator
function needs a rework, so ABNF terminals like %x5D-10FFFF
can work.
JavaScript, unlike Java, does not seem to support regular expressions with \x{10FFFF}
.
In instaparse for Clojure, single characters are represented as a string combinator with the surrogate pair (two 16-bit chars side by side), and a character range uses the regex \x{10FFF}
syntax. ClojureScript or JavaScript appear to not have much support for either of these things. It may be impossible to support Unicode character ranges in ABNF without introducing third-party js libraries.
OK, the former is doable via goog.i18n.uChar/fromCharCode.
Nice! Yeah, this character support code is probably the weakest part of the port.
I'm glad that you're finding these issues. I had a feeling there were some lurking issues there.
goog has some utils to work with surrogate strings, but the regex (char range) seems impossible without pulling in an external dependency like Regenerate. https://github.com/mathiasbynens/regenerate
Ah, yeah, I think I’d rather recreate the functionality internally than pull in extra deps. Definitely a bit of a pain though.
Interesting https://mathiasbynens.be/notes/javascript-unicode
I wonder if this issue is true for all browsers
Actually, if you could create a PR with a failing cljs test case that would be a good place to start
Hmm, now I'm mildly concerned because circleci is passing... ;)
Hmm, I think that's because there isn't really a notion of the cljs tests "passing" or "failing" (no exit codes)