is Meander appropriated to write a tokenizer (start from a single string and break tokens), and if it is, what would a base for that looks like?
Sorry meant to reply to this. We haven't focused much on text so I'm not sure if there would be a lot of benefit in using meander for this. Might be possible though.
Hard to say without looking at what you are doing, but I’d first look at Instaparse and maybe pair it with meander.
Yeah, that is a pretty good combination.
thanks guys, I find instaparse a bit too bulky for what I’m doing, because its not much a syntax, its more like a text email that I’m trying to extract data from. I’ve been doing ok using just meander and regex so far. one trick I did that made my life easier was to pre-parse the text and transform it in a “hiccup like” syntax, where there is one entry for each line. This makes easier to match on specific line numbers (when they make sense) and also avoid dealing with line breaks
(defn text->hiccup [text]
(into []
(map-indexed
#(vector (keyword (str "l" %)) %2))
(-> text
(str/split-lines))))
them I can match like this:
(-> (m/search hiccup
(m/scan [:l0 ?store])
{:riviera-delivery.order/store ?store}
(m/scan [:l2 (m/re #"Pedido número: #(\d+)" [_ ?id])])
{:riviera-delivery.order/id ?id}
(m/scan [_ (m/re #".*Total Geral: R\$(\d+,\d+).*" [_ ?total])])
{:riviera-delivery.order/total (u/parse-br-money ?total)}
(m/scan [_ (m/re #".*(\d+) x \.\.\.\.\.\. (.+?) \.\.\.\.\.\. R\$(\d+,\d+).*" [_ ?q ?n ?p])])
{:riviera-delivery.order/items
[{:riviera-delivery.item/quantity ?q
:riviera-delivery.item/price (u/parse-br-money ?p)
:riviera-delivery.item/name ?n}]})
(->> (apply merge-with into)))
example text that I’m matching against:
Padaria Bella Riviera
Pedido número: #1619127436602
Status: Pedido Entregue
Wilker, você será notificado (a) à cada nova alteração de status.
_______________________________________________________________________________________________________
Produtos:
3 x ...... Pão Francês (1unid) ...... R$1,10
1 x ...... Pão Ciabata (1unid) ...... R$5,50
I’m planning to add m/str
eventually for the purpose of matching/yielding strings (along with m/bytes
) but I’m focused on hitting the zeta
compiler goals I mentioned previously.
thanks for the snippet, I can see a parser from it 🙂