uncomplicate

whilo 2017-10-08T13:09:47.000038Z

Ok. I couldn't get the OpenCL namespace to load for [uncomplicate.neanderthal core opencl] before adding mkl.

whilo 2017-10-08T13:10:43.000040Z

I think a pure opencl version might be useful in the longer run, as one could then also port neanderthal to ClojureScript with WebCL.

whilo 2017-10-08T13:34:40.000010Z

This is not a feature request, just a thought I had. 🙂

2017-10-08T13:37:27.000042Z

Ah, I see. CLBlast and cuBLAS engines load mkl engine, but that can (potentially) be easily decoupled to load any default native engine. However, porting to ClojureScript would require much more than just WebCL. WebCL enables access to CL, but what about the actual linear algebra operations? Someone would have to implement those, which requires practically full-time commitment due to required knowledge and time to implement. Even when implemented, it would require the user to have those resources in the browser, and to have the hardware with appropriate drivers. In the practical sense, I do not see how porting computation engines to ClojureScript would be useful over using them in JVM. What would be useful to do in ClojureScript, is to support transfer of data and enable poking around matrix structures for display and plotting, and this is what some other libraries that have ClojureScript port are commonly used for. But, that is something available today. Turn any Neanderthal matrix to a clojure sequence by calling seq, or move it to an array or a vector by calling tranfer! and voila.

2017-10-08T13:38:00.000046Z

Sure, I didn't take it as a request. Feel free to openly discuss any idea that you might find interesting.

whilo 2017-10-08T13:39:48.000012Z

I am playing around with autograd on top of neanderthal atm.

whilo 2017-10-08T13:40:49.000054Z

The research group that I am in mostly uses pytorch, so do I, and I really like it. I will try to follow and have a bit a look into stalingrad (scheme compiler for autograd).

whilo 2017-10-08T13:41:47.000052Z

I am at the very beginning though, not sure yet how to handle in-place ops in an immutable compute graph. I will postpone that until later.

whilo 2017-10-08T13:42:48.000007Z

I think I have to wrap some common linear algebra operations and make them polymorphic for convenience, similar to pytorch, which follows numpy. I am not sure whether this aligns well with the perf. focus of neanderthal.

2017-10-08T13:43:56.000060Z

Hm, aren't they polymorphic already?

whilo 2017-10-08T13:44:06.000031Z

I also thought a bit about the tradeoffs between core.matrx (which has a cljs port that I used for my Bayesian Inference playground: http://replikativ.io/sghmc/) and neanderthal.

whilo 2017-10-08T13:44:16.000067Z

I mean also for scalars including broadcasting.

whilo 2017-10-08T13:44:55.000044Z

But I try to keep it simple for now. This is something that pytorch provides.

whilo 2017-10-08T13:45:20.000007Z

(independent of numpy, they implement their own tensor library as they originally did for lua)

whilo 2017-10-08T13:49:51.000065Z

I think the problem with core.matrix, as far as I understood you also, is that it is also polymorphic on all internal low-level operations that are used to implement linear algebra routines, instead of just exposing a convenient polymorphic high-level API for users who do not temper with the linear algebra internals.

whilo 2017-10-08T13:50:58.000085Z

Is it possible in neanderthal to switch the implementation libraries are using globally, so everybody uses the same backend?

2017-10-08T13:53:58.000066Z

I do not understand the question. The whole api is backend-agnostic, if that is what you mean. However, switching backends blindly (which you CAN do) is that CPU and GPU have their own strengths and weaknesses, so should be used together. Of course, if the machine have only the CPU, then it will be used for everything, but I have yet to encounter the machine that has the GPU, but does not have any CPU, and it runs JVM.

2017-10-08T13:55:43.000068Z

Use the factory that you want in the core namespace, per call, or set it as a global binding...

whilo 2017-10-08T13:58:02.000034Z

Ok, good. So when I develop and release against the CPU version, a GPU user will automatically create all tensors on the GPU?

2017-10-08T13:59:19.000034Z

The thing is, of course, that you have to move data somehow from the main memory to the GPU memory. Currently the MKL engine serves that purpose, but in the extreme case, you could provide your own "dummy" native engine when you construct the OpenCL engine... Probably even nil will work (it did last year) but I'm not sure now.

2017-10-08T14:01:00.000020Z

That depends on how you set everything up. Everything in neanderthal is pluggable and configurable - it is up to you how you assemble it. There is default configuration for convenience (the native, opencl and cuda namespaces), but you do not have to use it.

whilo 2017-10-08T14:01:41.000055Z

Ok, I think I need to have a closer look.

whilo 2017-10-08T14:01:55.000011Z

What is your take on automatic broadcasting?

2017-10-08T14:02:28.000068Z

Automatic in what sense?

whilo 2017-10-08T14:05:41.000019Z

In the sense that (+ scalar vector) will automatically broadcast the scalar to the size of vector.

2017-10-08T14:12:42.000040Z

If (when) I really need such operation, I am more inclined to create a separate broadcast method with well-defined semantics. As for that particular example, it is already supported in Neanderthal 0.17.0-SNAPSHOT by the linear-fraction method which can do the shift by a scalar, and works for both vectors and matrices, like this: (linear-frac a 3.333). I don't plan to pollute +, -, and other scalar operations in Neanderthal, but will probably add general broadcasting for tensors, when I add them. For vectors and matrices, I think it does much more harm than good, and can easily be achieved with existing functions.

whilo 2017-10-08T14:14:36.000024Z

What happens fairly regular in implementing deep learning architectures is that you need to add a row-vector row-wise over a matrix. But I will not do automatic broadcasting for now, just focus on autograd.

2017-10-08T14:19:10.000016Z

I know that, but IMHO, it is a NN-specific requirement that 1) can be achieved with existing operations with some performance penalty 2) Can be easily implemented in a simple OpenCL/CUDA kernel with no penalty using clojurecl/clojurecuda. 3) Is not that well-defined in math linear algebra textbooks (so it is easy to misuse).

whilo 2017-10-08T14:21:06.000040Z

Yes, I agree.

whilo 2017-10-08T14:26:16.000028Z

How are your Bayesian Inference things going?

whilo 2017-10-08T14:43:51.000053Z

I am mentioning it, because they are my goal also. Hopefully in form of an Anglican inference method for Bayesian Neural Networks.

whilo 2017-10-08T14:49:39.000019Z

like SGHMC

2017-10-08T15:05:51.000050Z

I haven't have time to work on it due to other things, but plan to implement CUDA engine in the mid-term.

2017-10-08T15:06:47.000079Z

As for usability, it has been rather functional since 2015. I just didn't put it in Clojars...

whilo 2017-10-08T15:08:15.000032Z

Ok, cool.

whilo 2017-10-08T15:49:30.000104Z

Is there a reason why Vectors and Matrices do not print into a readable format?

2017-10-08T15:53:34.000009Z

You mean readable by clojure reader?

2017-10-08T15:53:50.000099Z

Since I think they are definitively readable by the human user in the REPL:

#RealGEMatrix[double, mxn:3x4, layout:column, offset:0]
   ▥       ↓       ↓       ↓       ↓       ┓    
   →    1.0     9.9     2.56E+2 1.18E+4         
   →    1.8     27.     8.70E+2 4.67E+4         
   →    4.0     80.     3.13E+3 1.92E+5         
   â”—                                       â”›

whilo 2017-10-08T15:56:00.000020Z

Yes, I meant literal printing.

whilo 2017-10-08T15:56:16.000031Z

I think they already print all information to reconstruct them.

2017-10-08T15:56:17.000047Z

As for Clojure, I think transfer! is much more appropriate. Transfer the data into whatever you want without the string conversion and parsing.

whilo 2017-10-08T15:56:58.000088Z

Well, I often inline expressions from my REPL into my buffer, e.g. in tests. It is really handy if you have readable expressions then.

2017-10-08T15:57:17.000049Z

use seq in such cases

2017-10-08T15:57:35.000066Z

or (seq (view-vctr a))

whilo 2017-10-08T15:57:48.000053Z

Then I have to walk my whole nested data structure to transform the matrices into a different type.

2017-10-08T15:58:18.000074Z

Can you give me an example?

whilo 2017-10-08T15:58:41.000017Z

Right now I have a graph of an inner product:

2017-10-08T16:00:24.000117Z

How would Neanderthal matrices ideally be printed to help you here?

whilo 2017-10-08T16:01:43.000114Z

The atoms won't work here anyway, since they are also printed in non-readable fashion.

whilo 2017-10-08T16:03:01.000062Z

I think you can stick to your current way of printing, i would just avoid "n:1", "offset:0", and use either a map or "n 1 offset 0 stride 1" and put the whole matrix into the one expression after RealBlockVector.

whilo 2017-10-08T16:03:26.000082Z

Ah, for the matrices the special characters won't work, ofc.

whilo 2017-10-08T16:03:42.000041Z

But you can still print newlines.

whilo 2017-10-08T16:05:02.000028Z

Well, if the matrices get large, do you only print edges then?

whilo 2017-10-08T16:05:17.000123Z

numpy does this to avoid screwing your environment.

2017-10-08T16:07:03.000081Z

But why? This is the printing format optimized for human consumption (not perfect perhaps, and I am open for suggestions). What's intended for computers is transfer!. I understand your challenge here, but the actual issue is in printing your data structure, not the nested elements. In your case, I would implement the print-method for your defrecord that prints the nested elements in whatever way you think useful for your workflow. You can actually do this even for Neanderthal, just redefine print-method in your program.

2017-10-08T16:07:26.000019Z

Yes.

whilo 2017-10-08T16:08:18.000002Z

Yes, I am just wondering why people in general do not print readable things. Even if you print into a map which is just describing what was there, you can at least still read your buffer and access all other parts.

whilo 2017-10-08T16:08:41.000040Z

But "n:1" is not parseable.

2017-10-08T16:08:57.000010Z

But how would you print and read a 3000 x 1000 matrix?

2017-10-08T16:09:11.000058Z

How would you print (and read) a symmetric matrix?

2017-10-08T16:09:17.000033Z

Or a symmetric packed?

2017-10-08T16:09:32.000028Z

How to handle wildly different magnitudes of data?

whilo 2017-10-08T16:09:57.000024Z

If you don't want to print it itself, you can still print something that is edn and machine readable.

2017-10-08T16:10:10.000085Z

?

2017-10-08T16:10:19.000045Z

I don't understand.

whilo 2017-10-08T16:10:27.000065Z

I mean a placeholder.

2017-10-08T16:11:48.000042Z

look, (map seq a) gives you something like this: ((1 2 3) (4 5 5) (7 8 9))

whilo 2017-10-08T16:12:08.000119Z

Something like #RealBlockVector{:stride 1, :n 1, :offset 0, :value [[1 2 :... 999 1000] :... [1000 :... 2000]]}

whilo 2017-10-08T16:12:28.000089Z

If I have no read handler for it, the edn parser will just leave the map in its place.

whilo 2017-10-08T16:13:29.000060Z

You can still use newline formatting to make it human readable.

2017-10-08T16:13:56.000026Z

It was like this before, but I never need to read it by the edn parser, and I always need to read it in the REPL (where the actual data is displayed better than you suggest).

2017-10-08T16:14:17.000031Z

But, anyway, why the seq approach doesn't work for you?

whilo 2017-10-08T16:16:00.000090Z

Well it is a non-critical l issue, I can ofc. work around it and fix it myself. I think the data-driven approach of Clojure is very often sacrificed here and it makes copy and pasting values a lot harder. Like it is in Python where almost nothing is automatically readable and people pickle everything.

2017-10-08T16:17:35.000066Z

That use case is handled by the seq.

whilo 2017-10-08T16:18:15.000071Z

Ok

whilo 2017-10-08T16:18:42.000041Z

How close is neanderthal to the low-level BLAS operations in general?

whilo 2017-10-08T16:21:11.000069Z

It is as close as possible, right?

2017-10-08T16:21:54.000006Z

When it makes sense, yes.

whilo 2017-10-08T16:22:38.000012Z

Good.

whilo 2017-10-08T16:24:29.000012Z

Do you have any requirements or desirables for autograd?

2017-10-08T16:27:39.000017Z

?

2017-10-08T16:27:50.000034Z

Something I'd like to see?

whilo 2017-10-08T16:30:57.000014Z

Yes. You mentioned that you thought about adding something in this direction as well.

2017-10-08T16:33:30.000048Z

Sure. I'll first tackle (human-coded only) gradients and things related to them. Only then I'd know enough to form any strong opinion about auto-gradients. Until then, my primary concern is that they actually work, and are competitive with whatever is happening in other environments...

whilo 2017-10-08T16:36:24.000015Z

reverse autograd in form of theano, tensorflow and pytorch is very close to manually backpropagated gradients nowadays. Doing it by hand is error prone and often obscures code. Working in pytorch is really cool on the other hand as the gradient follows python's control flow. So you can calculate loops with variable iteration length, e.g. for LSTMs.

whilo 2017-10-08T16:37:08.000004Z

In fact the way I am doing it right now is similar to both pytorch and a manual NN implementation that I have in numpy.

whilo 2017-10-08T16:37:21.000001Z

linear regression works now 🙂

2017-10-08T16:37:41.000068Z

Well, cool!

whilo 2017-10-08T16:37:42.000074Z

A problem is that I have to box all things.

2017-10-08T16:38:05.000034Z

I'd love to see comparison with those libraries.

whilo 2017-10-08T16:39:18.000012Z

Sure, first of all I need to get a reasonable example code 🙂

whilo 2017-10-08T16:40:10.000109Z

This is how it looks like right now.

2017-10-08T16:40:52.000061Z

It looks simple enough.

2017-10-08T16:41:08.000074Z

Is there a reason you use loop and not map/reduce?

whilo 2017-10-08T16:41:39.000022Z

It is destructive, gd! applies the gradients in place.

2017-10-08T16:41:47.000045Z

But I didn't mean the comparison of code primarily, but the comparison of performance.

2017-10-08T16:42:32.000025Z

Fluokitten's fmap! offers destructive map, and it works on all types of vectors and matrices (if that can help here).

whilo 2017-10-08T16:43:26.000104Z

Right. Well I have to think about it. Usually I only use map/reduce in functional code, otherwise I use doseq or loop, to make it clear (for me).

whilo 2017-10-08T16:43:44.000069Z

I just did it because this is how it would look like in Python.

whilo 2017-10-08T16:44:09.000056Z

Not exactly like pytorch, I have made the compute graph lazy, so you have to call forward first.

whilo 2017-10-08T16:44:41.000033Z

This allows to transform the graph before applying it, but pytorch does not do it normally, as it is more intuitive not to mix in lazy computations.

whilo 2017-10-08T16:46:15.000017Z

So maybe I drop that.

whilo 2017-10-08T16:46:52.000014Z

grads is a full graph decorated with the gradients, which are only applied by gd!.

whilo 2017-10-08T16:53:35.000071Z

I think I will drop lazyness for now. This will allow the control flow to depend on the calculated values, which is much better.

whilo 2017-10-08T18:46:49.000096Z

@blueberry How will the tensor support look like?

whilo 2017-10-08T18:47:45.000013Z

I miss something like shape, which gives me all dimensions, e.g. in a vector for a matrix with 2 rows and 3 columns: [2 3]

whilo 2017-10-08T18:48:00.000088Z

dim just yields the product of the dimensions

2017-10-08T18:55:37.000002Z

dim is exactly how dimension is defined in mathematics. You can get what you want with mrows and ncols for matrices. [2 3] would have huge performance penalty (more than 10x). As for tensors, I am not sure yet. shape as in your example will probably be supported, but I'll try to find an option that is also performant...

2017-10-08T18:57:18.000058Z

Of course, presently you can implement your own shape protocol that calls dim for vectors, and [(.mrows a) (.ncols a)] for matrices...

whilo 2017-10-08T19:01:41.000065Z

Right.

whilo 2017-10-08T19:01:51.000046Z

Good to know that it is expensive.

2017-10-08T19:02:25.000051Z

It is obvious and inevitable. You have the construction of a vector and two boxings.

whilo 2017-10-08T19:02:25.000092Z

I need broadcasting now.

2017-10-08T19:02:51.000103Z

Not to mention the penalty when you actually read those number back

2017-10-08T19:03:09.000036Z

In itself it is not that much, but if you call it in a loop it adds up

whilo 2017-10-08T19:03:25.000035Z

I mean I need broadcasting for scalars, so the gradient is projected to them correctly.

2017-10-08T19:03:26.000012Z

Sometimes it is important, sometimes it is not

whilo 2017-10-08T19:03:39.000028Z

I see, yes.

2017-10-08T19:05:09.000107Z

If you call it once per matrix, then of course it is negligible

whilo 2017-10-08T19:06:54.000037Z

Yes, but it could slip. I try to vectorize as much as possible though.

whilo 2017-10-08T19:07:05.000003Z

Logistic regression also works.

whilo 2017-10-08T19:08:36.000023Z

Have you done random initialization of matrices yet? For GPU ones doing it on device is better than copying from main memory (e.g. Java).

whilo 2017-10-08T19:09:00.000035Z

I only need standard normals.

whilo 2017-10-08T19:09:06.000023Z

Or maybe uniform.

2017-10-08T19:10:14.000039Z

I do that in Bayadera, and generate random samples of various distributions in GPU memory directly.

2017-10-08T19:11:38.000030Z

The challenge regarding random numbers is to use quality random generator that is also parallelizable.

2017-10-08T19:12:06.000095Z

This is solved in Bayadera, but will be overkill to introduce it in Neanderthal just to support testing.

2017-10-08T19:12:56.000026Z

My general idea is that Neanderthal is a general purpose vectorization and linear algebra library, while Bayadera is for statistics and random stuff

2017-10-08T19:13:06.000058Z

randomized stuff

whilo 2017-10-08T19:13:55.000016Z

hehe

2017-10-08T19:14:22.000008Z

I might add random matirx data generation on the CPU in neanderthal though (feel free to open an issue, and I will think about that)

2017-10-08T19:15:00.000083Z

With the testing data generation quality, and intended only for generating random testing data

whilo 2017-10-08T19:17:48.000039Z

How bad are the random generators you are worried about?

2017-10-08T19:18:36.000027Z

For MCMC, probably all you'd encounter in Java are poor, including the Marsenne Twister

2017-10-08T19:19:26.000069Z

There are exceptions, but they are not that widely used in common Java libraries.

2017-10-08T19:20:00.000128Z

Not to mention that the default random() is unusable 🙂

whilo 2017-10-08T19:20:40.000126Z

Ok, I am a bit green here. Do you have any pointers?

2017-10-08T19:20:52.000074Z

I do. Use Bayadera 🙂

whilo 2017-10-08T19:20:57.000049Z

I mean links.

whilo 2017-10-08T19:21:01.000026Z

Ok 🙂