Hi everyone. I’ve created a little script that crawls a specific website and downloads all of the mp3 files there. It’s a Georgian site, and only Georgians would understand this 🙂 But I wanted to perhaps get some suggestions code-wise. It all works, but there’s always a room for improvement right?
(ns play-around.core
(:require [net.cgrand.enlive-html :as html]
[org.httpkit.client :as http]
[<http://clojure.java.io|clojure.java.io> :as io]
[clojure.string :as string])
(:gen-class))
(def urls #{"<http://radio105.wanex.net/arts_geo.htm>" "<http://radio105.wanex.net/songs_geo.htm>"})
(defn- get-dom
[url]
(html/html-snippet (:body @(http/get url))))
(defn- build-urls
[audio-links]
(for [audio-link audio-links]
{:download-url (format "<http://radio105.wanex.net/%s>" audio-link)
:filename audio-link}))
(defn- extract-links
[dom]
(map #(get-in % [:attrs :href]) (html/select dom [:td :a])))
(defn- download-file
[{:keys [download-url filename] :as file}]
(let [filename (string/replace filename #"\/" "_")
file (io/file "audio" filename)]
(if-not (.exists file)
(do
(println "Downloading" filename)
(with-open [in (io/input-stream download-url)
out (io/output-stream file)]
(io/copy in out)))
(println "File" filename "already exists, skipping"))))
(defn- download-files
[files]
(doall (map download-file files)))
(defn- is-mp3?
[link]
(re-matches #"(?i).*\.mp3" link))
(defn -main
[& args]
(when-not (.isDirectory (io/file "audio"))
(.mkdir (io/file "audio")))
(->> urls
(mapcat get-dom)
extract-links
(filter is-mp3?)
build-urls
download-files))
@bravilogy My 2 cents, build-urls should be a map
, no ?
May be you should rewrite the main like
(->>
urls
(mapcat get-dom)
(map ...)
(filter ...)
(map ...)
...
)
May be this will help you to see patterns, opportunity for transducers,...oh good point
the thing about the last map is that I need to run it through doall somehow
otherwise it’s not being ‘realized’
so should I chuck doall
in ->>
after everything?
or perhaps the thread’s output should then be passed to doall
separately like
(doall (->> urls
(mapcat ...)
....))
or even have it in a let block
(defn -main
[& args]
(when-not (.isDirectory (io/file "audio"))
(.mkdir (io/file "audio")))
(let [downloads (->> urls
(mapcat get-dom)
extract-links
(filter is-mp3?)
(map build-url)
(map download-file))]
(doall downloads)))
My Clojure experience is very limited, so take my advices with caution. 🙂
I think you should finish you (->>)
with a download-files
, but all previous are map and filter, and as reader, I would appreciate to identify it quickly.
be wary of functions that simply map over a collection like build-urls
@bravilogy
with this you are trying to do side effects (doall (map download-file files)))
but really doall/map is not a good combo for that. just use a doseq
Could you explain the point ?
sure, if you have a function that takes a collection but does nothing but call (map (fn something ]) coll), what you want instead is to (defn something ...)
and inline the map something
part at the usage site
OK, I prefer this kind of code style, too.
yeah it encourages reuse and idiomatic core library usage
(defn- download-files
[files]
(doseq [file files]
(download-file file)))
(defn -main
[& args]
(when-not (.isDirectory (io/file "audio"))
(.mkdir (io/file "audio")))
(->> urls
(mapcat get-dom)
extract-links
(filter is-mp3?)
(map build-url)
download-files))
thanks everyone 🙂
@bravilogy There's also run!
which is to map
as doseq
is to for
. In this case I'd prefer (->> ,,, ,,, (run! download-file))
.