
Data science, data analysis, and machine learning in Clojure for additional discussions

anyone use flare?


iirc clojure ml has way better perf than python so im looking into it

Aviv Kotek 2020-09-20T16:22:26.003900Z

exploring some basics of <|>.dataset, can't figure how to do simple df's arithmetic (same shape df's):

(let [cols (ds/column-names ds)
      units-total (ds/select-columns ds (filter #(s/starts-with? % "units_total") cols)) ;ds
      units-dwell (ds/select-columns ds (filter #(s/starts-with? % "units_dwell") cols))] ;ds
  (dfn/- units-total units-dwell))
=&gt; #object[tech.v2.datatype.binary_op$fn$reify__10844 0xf2ea2e8 "tech.v2.datatype.binary_op$fn$reify__10844@f2ea2e8"]
i'd expect a new dataset of same shape with each cell contains the given calculation.

genmeblog 2020-09-20T18:22:51.007300Z

@aviv dfn operates on something called a reader and returns a reader. Column is a reader. Dataset is not. So you have to select two columns, call function and insert such column to the dataset (new one).

genmeblog 2020-09-20T18:23:38.008600Z

You can treat columns as a vector and dfn operations as vectorized functions returning new vector.

Santiago 2020-09-28T10:31:21.002Z

curious about this too

Aviv Kotek 2020-09-20T18:45:54.008700Z

Is there any way to operate on datasets? I would not like to assoc each time a new column with reduce:

(reduce (fn [m name]
          (let [name (str name)]
            (assoc m (str "units_biz_" name)
                     (dfn/- (ds (str "units_total_" name))
                            (ds (str "units_dwelling_" name))))))
        ds (range 2010 2018))

Aviv Kotek 2020-09-20T18:50:50.008900Z

you mean something like that:

(let [cols (ds/column-names ds)
      units-total (ds/value-reader (ds/select-columns ds (filter #(s/starts-with? % "units_total") cols)))
      units-dwell (ds/value-reader (ds/select-columns ds (filter #(s/starts-with? % "units_dwell") cols)))]
(ds/name-values-seq-&gt;dataset (into {} (map #(hash-map (str %3) (dfn/- %1 %2)) units-total units-dwell (range 2010 2018)))))
won't it be common to use data-sets? this is not neat at all..