Has anyone looked at what the fastest PCA method available to us from Clojure is?
SMILE author claims he writed fastest algorithms.
clojure.core.matrix
has SVD method in it
We use (and expose) smile in http://tech.ml.dataset; he is using netlib blas under the covers. It would be interesting to time that against neanderthal but I imagine if you install mkl as your system blas then those timings aren't interesting.
Thanks for the feedback folks! I'm using <http://tech.ml|tech.ml>.dataset
, and I didn't time it, but it was at least dozens of minutes on a thousands by thousands matrix.
@metasoarous From my book (1,000 x 100,000 on 7 year old CPU i7-4790k):
(with-release [a (rand-normal! (fge 1000 100000))]
(time (pca (center! a))))
=> "Elapsed time: 355.167051 msecs"
@blueberry Epic 🙂 Thanks!
from which lib @blueberry?
No lib. The handful-of-lines-implementation of PCA explained in the book. Uses Neanderthal for linear algebra.
ah it's you dragan, cool
@metasoarous - Most likely the netlib is falling back on java implementation and not picking up system blas libraries. Regardless, you can transform you dataset to a tensor and from there you can copy it into neanderthal in a fairly straight forward manner and then get subsecond 🙂.
@chrisn Thanks for pointing that out. Realized that I haven't set up blas on this computer yet, so that would explain it. Other than timing things, is there a good way to check whether it's finding the blas routines?
Honestly, i do not know of any aside from timings. The netlib documentation may have more info; perhaps a verbose mode enabled by a java system property.
I would imagine intel mkl is an option and its installation process may have an option to set it as the system blas.
(Full decomposition; That is, not just power-iteration, etc)