Hi, I’m hoping to use meander to extract the relevant information from some XML. I’ve used clojure.data.xml
to parse the following XML. I would like to extract some fields for each of the :tag :record
records. I don’t expect a fully formed pattern but it would be great if someone could point me in the right direction.
You could start with m/$
which is kind of like jQuery:
;; Assuming `data` is the data you provided.
(m/search data
(m/$ {:tag :record :as ?data})
?data)
;; =>
({:attrs {},
:content
({:attrs {:status "deleted"},
:content
({:attrs {},
:content ("l4l:oai:<http://library.wur.nl:l4l/6091%22|library.wur.nl:l4l/6091">),
:tag :identifier}
{:attrs {}, :content ("2019-03-30T00:07:07Z"), :tag :datestamp}
{:attrs {}, :content ("l4l"), :tag :setSpec}),
:tag :header}),
:tag :record}
,,,)
This will find all the {:tag :record}
maps. 👍@noprompt thanks! But then I would want to dive deeper into the records to extract only the relevant fields and put them into a flat clojure map: {:title <extracted title> :description <extracted description> :more :fields :like :this}
. How would I go about that?
I’ve gotten as far as:
(m/search xml
(m/$ {:tag :record
:content (m/$ {:tag :title
:content (m/$ {:tag :langstring
:content (?title)})})})
{:title ?title})
Is this the best way to continue?
You could do that as well. However, if you know the data you are interested in exists in a certain location in the :content
you can simply draw that as a pattern:
{:tag :record
:content (m/scan {:tag :title :content (?title)})}
Judging from the data, I think I would recommend separating that out as separate step rather than do it all in the pattern match.
Ok, I will play with it some more, thanks for the tips!
@noprompt one more question if you have time:
(defn record
[record]
(m/find record
(m/separated {:tag :metadata
:content (m/scan {:tag :lom
:content (m/separated {:tag :general
:content (m/separated {:tag :title
:content (m/scan {:content ?title})}
{:tag :description
:content ?description})})})})
{:title ?title
:description ?description}))
This is how far I’ve gotten so far. It works mostly, but I would expect (m/scan {:content ?title})
to return a sequence of all the titles. The data for that particular would look like this:
({:attrs {:xml/lang "en"}
:content ("English title")
:tag :langstring}
{:attrs {:xml/lang "nl"}
:content ("Dutch title")
:tag :langstring})
I think you want to switch from find
to search
to yield all the results.
Is there a more idiomatic way to write
(m/rewrite query
{:source {:scope {:type "apps" :apps (m/some ?apps) & ?scope-rest} & ?source-rest} & ?rest}
{:source {:scope {:type "apps"
:apps ?apps
:segment-query {:where {:op "in"
:args [{:path ["event-attr" "appKey"]}
{:op "UNNEST" :args [{:op "ARRAY" :args ?apps}]}]}}
& ?scope-rest}
& ?source-rest}
& ?rest})
In deeply nested maps like this, it's mildly cumbersome to manage all of the & xyz
termsI don’t know of any different way to express that rewrite. But you can put the & ?rest
terms at the beginning if that helps you keep track of them better.
(m/rewrite query
{:source {:scope {:type "apps" :apps (m/some ?apps) & ?scope-rest} & ?source-rest} & ?rest}
{& ?rest
:source
{& ?source-rest
:scope
{& ?scope-rest
:type "apps"
:apps ?apps
:segment-query {:where {:op "in"
:args [{:path ["event-attr" "appKey"]}
{:op "UNNEST" :args [{:op "ARRAY" :args ?apps}]}]}}}}})
Other than syntactic things though, I can’t think of anything I would do differently.
Thanks. Yeah, reordering will help a bit.
My use case has a lot of this sort of thing. I was wondering if there is broad value for a special kind of rewrite
- something like rewrite-merge
where the syntax favors merging new information into the map