@blueberry hello, I have a questions about clojurecuda and neanderthal I checked mm with cuda and native methods — why cuda is slower? code for this:
(time
(let [x (dge 8000 6000 (repeatedly #(Math/random)))
y (trans x)]
(mm x y)))
=> "Elapsed time: 57824.432158 msecs"
(time
(cuda/with-default-engine
(with-release [gpu-x (cuda/cuge 8000 6000 (repeatedly #(Math/random)))
gpu-y (trans gpu-x)]
(mm gpu-x gpu-y))))
=> "Elapsed time: 59080.466968 msecs"
@rustam.gilaztdinov Both results are not right and are not relevant for what you are trying to measure. In the first case, you are really measuring the time it takes Clojure to create 48000000 random numbers, copy them to a matrix, and then multiply it with the transpose. In the second case, you are doing the same CPU work - CPU mm + transferring the data to the GPU + GPU mm + creating and destroying the CUDA context , which you should generally do only once or a few times during the lifetime of the application, not for every operation.
ok, can you please provide example how can I do this right? with this particular matrix?
It is literally the same, with a few minor differences, in the GPU tutorials on the Neanderthal website: http://neanderthal.uncomplicate.org/articles/tutorial_opencl.html and http://dragan.rocks/articles/17/CUDA-and-cuBLAS-GPU-matrices-in-Clojure
@blueberry: do you have any advice on learning CUDA ? I just converted http://www.jcuda.org/samples/JCublas2Sample.java and http://www.jcuda.org/samples/JCudaVectorAdd.java to clojure. Most existing books spend a lot of time talking about memory arch and GPU arch -- which I agree is important -- but I'm now looking for something like: here is 100 common numerical algorithms here is how we implement them on CUDA justifying design choices
I am not aware of any book that takes the approach you are looking for.