@alexisvincent great to hear
New to Specter. I'm scraping http://docs.h2o.ai/h2o/latest-stable/h2o-docs/rest-api-reference.html to build a vector of maps where each map will have a key for :http-verb, :rest-path :inputs and outputs. Another challenge is that the html appears to be in 4 conceptual sections, 1) a section of a href links with rest endpoints, 2) a section of h2 headings with the http-verb and rest endpoint followed by a table with Input and Output, 3) a section of a href links with schema nouns, and 4) a final section of h2 headings with schema noun name followed by a table of keys and their descriptions. How might I keep the four sections separate, before combining them? I'm also unclear if I should use select
, collect
, codewalker
, or continue-then-stay
to collect and surface nested pieces of information. Thanks in advance.
@aaelony you're going to have to be more specific
you want to use specter to extract information out of html?
can you paste a sample of the html you're scraping, and what you want as output?
ok, let me take some time to formulate a better question.
hi @nathanmarz, here is the code in clojure that I'm wondering how to produce in Specter.
(ns testing
(:require [net.cgrand.enlive-html :as html]
[org.httpkit.client :as http]
[clojure.string :as str] ))
(->> (html/html-snippet
(:body @(http/get "<http://docs.h2o.ai/h2o/latest-stable/h2o-docs/rest-api-reference.html>"
{:insecure false})))
(filterv #(= (:tag %) :html))
first
:content
(filterv #(= (:tag %) :body))
first
:content
(filterv #(= (:tag %) :div))
first
:content
(filterv #(= (:tag %) :h2))
(mapv #(let [[verb endpoint] (-> %
:content
first
(str/split #" ")
)
inputs (if endpoint
(re-seq #"\{(.*?)\}" endpoint))
]
{:verb verb :endpoint endpoint :inputs inputs}
))
(filterv #(or (= (:verb %) "GET")
(= (:verb %) "POST")
(= (:verb %) "DELETE")
(= (:verb %) "HEAD")))
)