(defn ->char-strings [str]
(let [iter (doto (BreakIterator/getCharacterInstance)
(.setText str))]
(loop [start (.first iter)
res []]
(let [end (.next iter)]
(if (= end BreakIterator/DONE)
res
(recur
end
(conj res (subs str start end))))))))
Wrote a quick function that “splits” up complex unicode with Java’s ICU4J
Not sure if above is the most clojury soln. If you have better ideas lmk!Some really minor stuff, like see if you can use identical?
instead of =
in (= end BreakIterator/DONE)
, or make res
transient inside the loop, but these are all optimizations
Thanks Ben. Noob q: what do you mean by make res
transient inside the loop?
like so:
(defn ->char-strings [str]
(let [iter (doto (BreakIterator/getCharacterInstance)
(.setText str))]
(loop [start (.first iter)
res (transient [])]
(let [end (.next iter)]
(if (= end BreakIterator/DONE)
(persistent! res)
(recur
end
(conj! res (subs str start end))))))))
aah! I didn’t know about this fn! Thanks Ben
Looks reasonable to me.