data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
2020-09-20T12:33:17.001100Z

anyone use flare?

2020-09-20T12:33:35.001800Z

iirc clojure ml has way better perf than python so im looking into it

Aviv Kotek 2020-09-20T16:22:26.003900Z

exploring some basics of <http://tech.ml|tech.ml>.dataset, can't figure how to do simple df's arithmetic (same shape df's):

(let [cols (ds/column-names ds)
      units-total (ds/select-columns ds (filter #(s/starts-with? % "units_total") cols)) ;ds
      units-dwell (ds/select-columns ds (filter #(s/starts-with? % "units_dwell") cols))] ;ds
  (dfn/- units-total units-dwell))
=&gt; #object[tech.v2.datatype.binary_op$fn$reify__10844 0xf2ea2e8 "tech.v2.datatype.binary_op$fn$reify__10844@f2ea2e8"]
i'd expect a new dataset of same shape with each cell contains the given calculation.

genmeblog 2020-09-20T18:22:51.007300Z

@aviv dfn operates on something called a reader and returns a reader. Column is a reader. Dataset is not. So you have to select two columns, call function and insert such column to the dataset (new one).

genmeblog 2020-09-20T18:23:38.008600Z

You can treat columns as a vector and dfn operations as vectorized functions returning new vector.

Santiago 2020-09-28T10:31:21.002Z

curious about this too

Aviv Kotek 2020-09-20T18:45:54.008700Z

Is there any way to operate on datasets? I would not like to assoc each time a new column with reduce:

(reduce (fn [m name]
          (let [name (str name)]
            (assoc m (str "units_biz_" name)
                     (dfn/- (ds (str "units_total_" name))
                            (ds (str "units_dwelling_" name))))))
        ds (range 2010 2018))

Aviv Kotek 2020-09-20T18:50:50.008900Z

you mean something like that:

(let [cols (ds/column-names ds)
      units-total (ds/value-reader (ds/select-columns ds (filter #(s/starts-with? % "units_total") cols)))
      units-dwell (ds/value-reader (ds/select-columns ds (filter #(s/starts-with? % "units_dwell") cols)))]
(ds/name-values-seq-&gt;dataset (into {} (map #(hash-map (str %3) (dfn/- %1 %2)) units-total units-dwell (range 2010 2018)))))
won't it be common to use data-sets? this is not neat at all..