wilkerlucio 2021-04-25T02:37:42.179800Z

is Meander appropriated to write a tokenizer (start from a single string and break tokens), and if it is, what would a base for that looks like?

jimmy 2021-04-27T20:17:27.181700Z

Sorry meant to reply to this. We haven't focused much on text so I'm not sure if there would be a lot of benefit in using meander for this. Might be possible though.

JAtkins 2021-04-27T20:20:41.182100Z

Hard to say without looking at what you are doing, but I’d first look at Instaparse and maybe pair it with meander.

jimmy 2021-04-27T20:26:41.182300Z

Yeah, that is a pretty good combination.

wilkerlucio 2021-04-27T21:47:57.182600Z

thanks guys, I find instaparse a bit too bulky for what I’m doing, because its not much a syntax, its more like a text email that I’m trying to extract data from. I’ve been doing ok using just meander and regex so far. one trick I did that made my life easier was to pre-parse the text and transform it in a “hiccup like” syntax, where there is one entry for each line. This makes easier to match on specific line numbers (when they make sense) and also avoid dealing with line breaks

(defn text->hiccup [text]
  (into []
          #(vector (keyword (str "l" %)) %2))
        (-> text
them I can match like this:
(-> (m/search hiccup
      (m/scan [:l0 ?store])
      {:riviera-delivery.order/store ?store}
      (m/scan [:l2 (m/re #"Pedido número: #(\d+)" [_ ?id])])
      {:riviera-delivery.order/id ?id}
      (m/scan [_ (m/re #".*Total Geral: R\$(\d+,\d+).*" [_ ?total])])
      {:riviera-delivery.order/total (u/parse-br-money ?total)}

      (m/scan [_ (m/re #".*(\d+) x \.\.\.\.\.\. (.+?) \.\.\.\.\.\. R\$(\d+,\d+).*" [_ ?q ?n ?p])])
       [{:riviera-delivery.item/quantity ?q
         :riviera-delivery.item/price    (u/parse-br-money ?p)
         :riviera-delivery.item/name     ?n}]})
    (->> (apply merge-with into)))

wilkerlucio 2021-04-27T21:50:59.182800Z

example text that I’m matching against:

Padaria Bella Riviera

Pedido número: #1619127436602

Status: Pedido Entregue

Wilker, você será notificado (a) à cada nova alteração de status.



3 x ...... Pão Francês (1unid) ...... R$1,10

1 x ...... Pão Ciabata (1unid) ...... R$5,50

noprompt 2021-04-28T22:21:28.185Z

I’m planning to add m/str eventually for the purpose of matching/yielding strings (along with m/bytes) but I’m focused on hitting the zeta compiler goals I mentioned previously.

wilkerlucio 2021-04-28T22:46:16.185600Z

thanks for the snippet, I can see a parser from it 🙂

