data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
2020-07-03T19:50:58.384900Z

Has anyone looked at what the fastest PCA method available to us from Clojure is?

genmeblog 2020-07-04T12:38:28.386200Z

SMILE author claims he writed fastest algorithms.

Aviv Kotek 2020-07-05T12:23:32.386600Z

clojure.core.matrix has SVD method in it

chrisn 2020-07-05T22:00:50.387Z

We use (and expose) smile in http://tech.ml.dataset; he is using netlib blas under the covers. It would be interesting to time that against neanderthal but I imagine if you install mkl as your system blas then those timings aren't interesting.

2020-07-06T01:30:50.388600Z

Thanks for the feedback folks! I'm using <http://tech.ml|tech.ml>.dataset, and I didn't time it, but it was at least dozens of minutes on a thousands by thousands matrix.

2020-07-09T00:47:49.394700Z

@metasoarous From my book (1,000 x 100,000 on 7 year old CPU i7-4790k):

(with-release [a (rand-normal! (fge 1000 100000))]
(time (pca (center! a))))
=&gt; "Elapsed time: 355.167051 msecs"

2020-07-09T01:23:47.394900Z

@blueberry Epic 🙂 Thanks!

Aviv Kotek 2020-07-09T07:57:49.395100Z

from which lib @blueberry?

2020-07-09T08:45:46.395300Z

No lib. The handful-of-lines-implementation of PCA explained in the book. Uses Neanderthal for linear algebra.

Aviv Kotek 2020-07-09T09:05:54.395600Z

ah it's you dragan, cool

chrisn 2020-07-09T14:19:35.395800Z

@metasoarous - Most likely the netlib is falling back on java implementation and not picking up system blas libraries. Regardless, you can transform you dataset to a tensor and from there you can copy it into neanderthal in a fairly straight forward manner and then get subsecond 🙂.

2020-07-09T16:09:28.396Z

@chrisn Thanks for pointing that out. Realized that I haven't set up blas on this computer yet, so that would explain it. Other than timing things, is there a good way to check whether it's finding the blas routines?

chrisn 2020-07-10T02:12:50.422300Z

Honestly, i do not know of any aside from timings. The netlib documentation may have more info; perhaps a verbose mode enabled by a java system property.

chrisn 2020-07-10T02:14:30.422500Z

I would imagine intel mkl is an option and its installation process may have an option to set it as the system blas.

2020-07-03T19:51:32.385400Z

(Full decomposition; That is, not just power-iteration, etc)