meander

All things about https://github.com/noprompt/meander Need help and no one responded? Feel free to ping @U5K8NTHEZ
jlmr 2020-07-01T08:51:13.462400Z

Hi, I’m hoping to use meander to extract the relevant information from some XML. I’ve used clojure.data.xml to parse the following XML. I would like to extract some fields for each of the :tag :record records. I don’t expect a fully formed pattern but it would be great if someone could point me in the right direction.

noprompt 2020-07-01T08:57:49.462700Z

You could start with m/$ which is kind of like jQuery:

;; Assuming `data` is the data you provided.
(m/search data
  (m/$ {:tag :record :as ?data})
  ?data)
;; =>
({:attrs {},
  :content
  ({:attrs {:status "deleted"},
    :content
    ({:attrs {},
      :content ("l4l:oai:<http://library.wur.nl:l4l/6091%22|library.wur.nl:l4l/6091">),
      :tag :identifier}
     {:attrs {}, :content ("2019-03-30T00:07:07Z"), :tag :datestamp}
     {:attrs {}, :content ("l4l"), :tag :setSpec}),
    :tag :header}),
  :tag :record}
 ,,,)
This will find all the {:tag :record} maps. 👍

jlmr 2020-07-01T09:02:08.462900Z

@noprompt thanks! But then I would want to dive deeper into the records to extract only the relevant fields and put them into a flat clojure map: {:title &lt;extracted title&gt; :description &lt;extracted description&gt; :more :fields :like :this}. How would I go about that?

jlmr 2020-07-01T09:03:11.463100Z

I’ve gotten as far as:

(m/search xml
    (m/$ {:tag :record
          :content (m/$ {:tag :title
                         :content (m/$ {:tag :langstring
                                        :content (?title)})})})
    {:title ?title})

jlmr 2020-07-01T09:03:39.463300Z

Is this the best way to continue?

noprompt 2020-07-01T09:06:16.463500Z

You could do that as well. However, if you know the data you are interested in exists in a certain location in the :content you can simply draw that as a pattern:

{:tag :record 
 :content (m/scan {:tag :title :content (?title)})}

noprompt 2020-07-01T09:08:47.463700Z

Judging from the data, I think I would recommend separating that out as separate step rather than do it all in the pattern match.

jlmr 2020-07-01T09:10:47.463900Z

Ok, I will play with it some more, thanks for the tips!

jlmr 2020-07-01T11:35:50.464100Z

@noprompt one more question if you have time:

(defn record
  [record]
  (m/find record
    (m/separated {:tag :metadata
                  :content (m/scan {:tag :lom
                                    :content (m/separated {:tag :general
                                                           :content (m/separated {:tag :title
                                                                                  :content (m/scan {:content ?title})}
                                                                                 {:tag :description
                                                                                  :content ?description})})})})
    {:title ?title
     :description ?description}))
This is how far I’ve gotten so far. It works mostly, but I would expect (m/scan {:content ?title}) to return a sequence of all the titles. The data for that particular would look like this:
({:attrs {:xml/lang "en"}
  :content ("English title")
  :tag :langstring}
 {:attrs {:xml/lang "nl"}
  :content ("Dutch title")
  :tag :langstring})

noprompt 2020-07-01T17:56:09.464300Z

I think you want to switch from find to search to yield all the results.

👍 1
markaddleman 2020-07-01T23:13:19.465600Z

Is there a more idiomatic way to write

(m/rewrite query
    {:source {:scope {:type "apps" :apps (m/some ?apps) &amp; ?scope-rest} &amp; ?source-rest} &amp; ?rest}
    {:source {:scope {:type          "apps"
                      :apps          ?apps
                      :segment-query {:where {:op   "in"
                                              :args [{:path ["event-attr" "appKey"]}
                                                     {:op "UNNEST" :args [{:op "ARRAY" :args ?apps}]}]}}
                      &amp;              ?scope-rest}
              &amp;      ?source-rest}
     &amp;       ?rest})
In deeply nested maps like this, it's mildly cumbersome to manage all of the &amp; xyz terms

jimmy 2020-07-01T23:20:19.466200Z

I don’t know of any different way to express that rewrite. But you can put the &amp; ?rest terms at the beginning if that helps you keep track of them better.

(m/rewrite query
  {:source {:scope {:type "apps" :apps (m/some ?apps) &amp; ?scope-rest} &amp; ?source-rest} &amp; ?rest}
  {&amp; ?rest
   :source 
   {&amp; ?source-rest
    :scope 
    {&amp; ?scope-rest
     :type "apps"
     :apps ?apps
     :segment-query {:where {:op "in"
                             :args [{:path ["event-attr" "appKey"]}
                                    {:op "UNNEST" :args [{:op "ARRAY" :args ?apps}]}]}}}}})

jimmy 2020-07-01T23:21:43.466800Z

Other than syntactic things though, I can’t think of anything I would do differently.

markaddleman 2020-07-01T23:22:15.467500Z

Thanks. Yeah, reordering will help a bit.

markaddleman 2020-07-01T23:23:01.468400Z

My use case has a lot of this sort of thing. I was wondering if there is broad value for a special kind of rewrite - something like rewrite-merge where the syntax favors merging new information into the map