uncomplicate

jcf 2017-10-17T15:16:59.000510Z

Hello all. Anyone come across this issue loading libcublas?

Stack trace from the attempt to load the library as a resource:
java.lang.UnsatisfiedLinkError: /tmp/libJCublas2-0.8.0-linux-x86_64.so: libcublas.so.8.0: cannot open shared object file: No such file or directory
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1941)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1824)
	at java.lang.Runtime.load0(Runtime.java:809)
	at java.lang.System.load(System.java:1086)
	at jcuda.LibUtils.loadLibraryResource(LibUtils.java:260)
	at jcuda.LibUtils.loadLibrary(LibUtils.java:158)
	at jcuda.jcublas.JCublas2.initialize(JCublas2.java:81)
	at jcuda.jcublas.JCublas2.<clinit>(JCublas2.java:66)
	at uncomplicate.neanderthal.internal.device.cublas$eval51203.invokeStatic(cublas.clj:764)

jcf 2017-10-17T15:17:22.000229Z

I've got cuda installed to /opt/cuda, and can see libcublas.so in my LD path.

jcf 2017-10-17T15:17:52.000385Z

Oh, wait. I have version 9 and this is looking for version 8!

jcf 2017-10-17T15:18:05.000488Z

ldconfig -p | rg libcublas
	libcublas.so.9.0 (libc6,x86-64) => /opt/cuda/lib64/libcublas.so.9.0
	libcublas.so (libc6,x86-64) => /opt/cuda/lib64/libcublas.so

2017-10-17T17:50:49.000092Z

@jcf You are right. It currently requires CUDA 8 (downgrade helps if you've already updated to 9).

2017-10-17T17:51:08.000264Z

It'll be soon upgraded to CUDA 9.

whilo 2017-10-17T19:51:33.000241Z

@blueberry I don't know if you have seen it, but I am discussing autograd and neanderthal with the cortex guys: https://groups.google.com/forum/#!topic/clojure-cortex/ba4eVXT8DMM

whilo 2017-10-17T19:54:20.000295Z

I would be interested whether you think that a generalized buffer management DSL like Cortex builds has advantages over neanderthals direct API mapping. One can do ahead of time optimizations and compilation with a data description (AST) of tensor operations. I don't see this as competing with neanderthal, because it provides the low-level primitives. But I might miss some subtleties.

whilo 2017-10-17T20:03:16.000011Z

And I am curious about your opinion on buffer-management in the special purpose low-level matrix formats, that are actually used by most higher-level frameworks.

whilo 2017-10-17T20:03:48.000571Z

I think the plurality of deep learning frameworks is obfuscating the fact that they all use these low-level APIs.

whilo 2017-10-17T20:04:07.000374Z

BLAS in form of MKL, OpenCL or Cuda and additional ones like cuDNN.

2017-10-17T20:04:38.000173Z

One interesting things with that is that it is (was for me) impossible to find any comparison with any other implementation. Either performance-wise or by the ease of use. I am interested in Cortex and follow what's happening from time to time. I never saw any comparison of Cortex with any of the leading libs (TF, Caffe, Pytorch etc.), or even with dl4j, that says "in this and this example, Cortex achieved this speed compared to library X", or "we showed that cortex requires less configuration" or whatever. There are some blog posts, but these are more like "you can build this example model with cortex" and nothing more. So, the only thing interesting to me now is that it is a Clojure-centric library. That is awesome, but only if I can leverage it to build custom, or not-strictly-NN things with it. If it is aimed at users who need a packeged ready-to-use solution where you provide some configuration map, and the lib figures some of the built in stuff, and that's it, then why would I use it instead of TF? I would like to know, but until now I didn't find a compelling answer. Of course, there's nothing wrong with that, ThinkTopic earns money using Cortex, and they are happy with it...

2017-10-17T20:05:39.000655Z

The thing with that is that they use something akin to NDArray. This is basically an N-dimensional dense cube.

2017-10-17T20:09:03.000532Z

This is fine if you only need it for "standard" NNs. But I am not sure what kind of "generalized buffer management" that refers to. They reshape a hypercube from one dimension to the other, but the structure is always dense, without any other automatic property. How do you specify an (optimized) symmetric 2-D array there? What is the performance of linear algebra operations?

whilo 2017-10-17T20:14:04.000521Z

Yes, I see this problem as well.

2017-10-17T20:14:18.000150Z

In your particular case, since you need to experiment with a novel NN method, how can you (re)use Cortext there? I don't have idea.

whilo 2017-10-17T20:15:44.000267Z

A question I have is whether it is possible to transform (reshape) these arrays through the low-level APIs and how the OpenCL access to the buffers work in that context. Is it possible for example to directly transfer a triangular matrix into an equivalent dense one without unnecessary copying.

2017-10-17T20:16:24.000510Z

In neanderthal-of course, provided that you want dense triangular matrix (TR)

2017-10-17T20:17:40.000551Z

(def a (fge 5 5 (range 1 26))
(view-tr a)

2017-10-17T20:18:13.000256Z

that is, for tr->ge

2017-10-17T20:18:33.000427Z

(def a (ftr 5 5 (range 1 26))
(view-ge a)

2017-10-17T20:19:29.000017Z

The question with the NDarray approach is: can it describe anything other than the general dense matrix (in the case of dim=2)

2017-10-17T20:19:31.000308Z

?

2017-10-17T20:20:54.000329Z

As for your own opencl or cuda kernels: You get the raw buffer. Accessing the structure inside the kernel is completely up to you.

2017-10-17T20:28:26.000605Z

Of course, with either of the libraries, you can take the raw buffer structure and do whatever you please with its contents...

whilo 2017-10-17T21:09:15.000486Z

Yes. But the question is whether I am doing stupid memory copying or accesses from the low-level perspective. I can describe all kinds of high-level tensor operations, like reshaping for example. But if they yield in copying of memory or inefficient access of the elements in tensor operations, then this is a clear problem that cannot be abstracted away.

whilo 2017-10-17T21:09:44.000375Z

I understand neanderthal as focusing to avoid exactly this problem of all generalizing APIs for tensors, e.g. a ndarray lib.

whilo 2017-10-17T21:11:19.000058Z

I still think these operations should be supported, but it has to be possible to opt out. This is only possible if the higher-level abstractions are built out of the lower ones. That is why I think a stack on neanderthal might be a much better toolbox for optimization pipelines, than defining some high-level API which leaves the rest of the mapping to low-level primitives to external, opaque AOT compilation pipelines.

whilo 2017-10-17T21:12:05.000297Z

I understand that these pipelines represent significant engineering effort.

whilo 2017-10-17T21:12:34.000164Z

Yet my current experience tells me that this is not really that relevant for deep learning in large scale environments.

whilo 2017-10-17T21:13:50.000350Z

I have problems with the Python runtime as a deployment and data processing environment (slow, hacky and a lot of accumulated debt + lack of enterprise support). But not with pytorch.

whilo 2017-10-17T21:14:38.000244Z

This is a very important insight, I think. Up until pytorch I bought the common wisdom that TensorFlow, Theano or Mxnet are necessary as a high-level API interface to a middleware doing all the stuff for you.

whilo 2017-10-17T21:15:36.000193Z

Pytorch only provides autograd and the necessary low-level ops to execute tensor operations efficiently (i.e. without unnecessary copying), yet it proves competitive to almost all people I currently talk to, who train models basically all day.

whilo 2017-10-17T21:16:17.000419Z

The arguments that industry needs larger scale deployments involves the data-processing pipeline, deployment and parallelization.

whilo 2017-10-17T21:16:44.000143Z

The first two are a problem of Python, but the latter has been successfully tackled with pytorch.

2017-10-17T21:17:31.000226Z

I agree, and this is roughly the idea that I pursue with neanderthal: provide different layers that you can use directly by hand, or/and build higher-level magic on top of.

2017-10-17T21:19:19.000193Z

And each layer adds as little complexity as possible, while being as automatic as desired (but not more)

2017-10-17T21:22:32.000009Z

And I am yet to find a clean presentation of a simple tensor operation in these libraries. More often than not, the examples revolve exclusively around computation graphs. I'd like to see a nice description of how I can create some tensors and do tensor contraction without much fuss...

whilo 2017-10-17T21:24:53.000421Z

What do you mean with tensor contraction?

2017-10-17T21:26:27.000420Z

a tensor equivalent of matrix multiplication https://en.wikipedia.org/wiki/Tensor_contraction

whilo 2017-10-17T21:26:31.000513Z

Something needed for deep-learning is efficient broadcasting, so (mini-)batches of data can be quickly sent through matrix multiplication. I am not sure how to do this best from a low-level perspective.

whilo 2017-10-17T21:26:50.000204Z

I see.

2017-10-17T21:27:26.000119Z

That's the thing. All talk is about tensor this tensor that, but underneath the surface, everyone is working with graphs and matrices.

2017-10-17T21:28:16.000536Z

I do not claim (or even say) that they should do this with tensor contractions.

2017-10-17T21:28:36.000545Z

They probably do this quite optimally, especially with cuDNN.

2017-10-17T21:28:54.000129Z

I'm just not sure that it has to do that much with tensors proper

2017-10-17T21:30:24.000342Z

Or "tensor" there is just a convenient way to describe 4-dimensional layers of matrices, because images happen to be conveniently described by m x n x 3 cubes, stacked into 4-dim cube in memory.

2017-10-17T21:31:56.000277Z

Which I am sure on some level is equivalent to tensors, but I am not sure helps in a general way. What if you need 6-dim tensor, for whatever reason. Would any of those libraries help you? (A genuine question. I don't know the answer.)

whilo 2017-10-17T21:32:24.000302Z

Me neither.

whilo 2017-10-17T21:32:42.000139Z

I know that they have to use the low-level BLAS primitives for optimal performance.

2017-10-17T21:32:55.000172Z

I don't even claim that 6-dim tensor is a particularly useful thing...

whilo 2017-10-17T21:32:55.000198Z

I don't think they reimplement the actual matrix multiplications.

whilo 2017-10-17T21:33:08.000480Z

Well, it probably can be.

2017-10-17T21:33:24.000131Z

BLAS is vector/matrix all the way down. Nothing to do with tensors or ND-arrays

whilo 2017-10-17T21:33:35.000221Z

This kind of stacking can be helpful, e.g. to represent embeddings of matrices, e.g. for binary relations.

whilo 2017-10-17T21:33:44.000073Z

Yes.

whilo 2017-10-17T21:34:04.000098Z

So tensor contraction is best implemented with these primitives, but not besides them.

2017-10-17T21:34:15.000400Z

But I think that cuDNN is not BLAS, but a bunch of specific 4-dim ND operations optimized for images

whilo 2017-10-17T21:34:27.000099Z

Yes, I think so, too.

whilo 2017-10-17T21:34:49.000363Z

My OpenCL readup on convolutions basically yielded that they are far superior on their own hardware.

whilo 2017-10-17T21:34:56.000278Z

I mean cuDNN.

whilo 2017-10-17T21:35:05.000220Z

This is the reality.

2017-10-17T21:35:23.000349Z

The trouble is that general n-dim tensor contraction suffers from the curse of dimensionality - it's O(n^d)

whilo 2017-10-17T21:35:29.000016Z

So you have to use these kind of supplied native bindings which represent your interface to the hardware.

2017-10-17T21:35:58.000297Z

That's correct

whilo 2017-10-17T21:36:29.000098Z

Ok, I am not sure about whether actual contraction is what people do in minibatch optimization.

2017-10-17T21:38:01.000049Z

For any kind of vectorized operation you have to use hardware-optimized primitives and that's it. We can daydream about the niceties of clojure, but one look at the benchmarks shows otherwise...

2017-10-17T21:39:07.000156Z

Maybe it is a contraction (or not) but the point is that it is not any kind of general contraction operation, nor anyone uses it in a general way.

2017-10-17T21:40:01.000532Z

It is a specialized thing for a specialized purpose (NNs optimized for images and similar signals).

2017-10-17T21:40:51.000495Z

Where Clojure comes handy is composing those well-defined lower level operations dynamically.

2017-10-17T21:41:09.000317Z

And interactively!

whilo 2017-10-17T21:43:32.000471Z

I agree. One of the objection is valid though, whether Clojure can actually be attractive for researchers.

whilo 2017-10-17T21:43:44.000309Z

Julia is a very strong contender, I would say.

whilo 2017-10-17T21:44:36.000436Z

It still blows anything else out of the water when it comes to data processing in my experience, but this is not so important for researchers.

whilo 2017-10-17T21:45:58.000505Z

My current pytorch script has one nested loop (3-levels) and a bit of model descriptions, written down without too much higher-level abstractions, because it basically runs from top to bottom like a shell script with global state.

2017-10-17T21:46:08.000258Z

I think Clojure fills a nice spot between being good enough for experimenting, while being easy enough to integrate in production.

whilo 2017-10-17T21:46:33.000338Z

I agree, but we have also to convince researchers, because they will fill in the gaps in the tooling over time.

whilo 2017-10-17T21:46:46.000509Z

They might not be good at the core engineering, but a community is important.

2017-10-17T21:47:12.000106Z

Sure, Julia might be great for algorithmic thinkering (but I don't see it much different or better than Clojure there for my needs) but then -> how do you use in production?

whilo 2017-10-17T21:47:41.000114Z

Point taken, I agree absolutely. But for researchers this point is very unimportant.

whilo 2017-10-17T21:47:47.000368Z

At least superficially.

whilo 2017-10-17T21:48:09.000479Z

If the production environment provides them with better tools to organize their experiments, then it is important.

2017-10-17T21:48:23.000134Z

That's fine. But what can I do about that? Probably nothing much...

whilo 2017-10-17T21:48:40.000036Z

I still think that for example plotting in Clojure for example is not as easy as matplotlib or ggplot in R.

whilo 2017-10-17T21:48:53.000464Z

I use plotly, which is ok, but it requires a web view.

2017-10-17T21:49:01.000466Z

Yeah. Currently a huge empty spot.

2017-10-17T21:49:43.000503Z

I mean, there are million of options for basic plotting in JVM

whilo 2017-10-17T21:49:48.000015Z

No, you can't. I am not proposing to implement anything yet. I just think that we need to give them an execuse to do Clojure. Some will like it, but it will need to have plausible horizon for research as well.

2017-10-17T21:49:58.000082Z

But nothing automatic like ggplot or matplotlib

whilo 2017-10-17T21:50:10.000352Z

Yes, but I have none so far that is good enough for high quality paper plots.

whilo 2017-10-17T21:50:17.000300Z

Maybe plotly is.

whilo 2017-10-17T21:50:33.000128Z

I haven't tried hard enough yet, it is fairly well done and a long-term project with commercial backing.

whilo 2017-10-17T21:50:53.000162Z

ClojureScript is also an asset, I would say.

2017-10-17T21:51:01.000455Z

Yep.

whilo 2017-10-17T21:51:13.000247Z

But this is not obvious to people wanting to do optimization experiments.

whilo 2017-10-17T21:51:33.000053Z

They expect something matlab like to start with.

whilo 2017-10-17T21:52:29.000392Z

incanter tried it for R users, but I think it was a way too large chunk of work necessary to swallow at once and it has not yielded composable libraries.

2017-10-17T21:53:23.000327Z

I think ClojureScript is great for building an "presentation server" for these plots (possibly using plotly). Provide a good generic interface from Clojure REPL and that's it. I think that's the way @hswick’s library works, and I think it is a good approach.

👍 1
whilo 2017-10-17T21:53:46.000156Z

I think at least it would be important that there is a set of libraries that play well together in general. Friction is a show stopper in my experience. numpy for example standardized the basic linear algebra in Python so that it became reasonable to use for numerical optimization.

2017-10-17T21:54:42.000333Z

I still think that trying to provide an X-like experience to win over the users of X is something that will not work.

whilo 2017-10-17T21:54:51.000196Z

I agree.

whilo 2017-10-17T21:55:06.000050Z

But doing the things right that X did, might be crucial.

2017-10-17T21:55:13.000076Z

Because people who prefer the X experience will use X

whilo 2017-10-17T21:55:28.000432Z

I think composability is a key experience in Clojure.

whilo 2017-10-17T21:55:45.000115Z

This cannot be done if X is copied.

2017-10-17T21:56:21.000196Z

Maybe there is a better Clojure way. Provide the best Clojure experience, so people who prefer Clojure (us) can do things they need, not some imaginary users that we are trying to convert. That's at least how I look at it.

2017-10-17T21:56:52.000073Z

I create this for me and people who find this approach useful. Not for someone else who might prefer something else.

whilo 2017-10-17T21:57:13.000268Z

Well, I am not imaginary 😂

whilo 2017-10-17T21:57:31.000169Z

I agree.

2017-10-17T21:57:35.000184Z

You do not need to be converted 🙂

whilo 2017-10-17T21:58:07.000245Z

But I think the people in my environment are not stupid, they have reasons for their choices and they are flexible. If you can show something better, they prefer it.

whilo 2017-10-17T21:58:42.000330Z

The friend of mine, actually a pretty smart mathematician and Bayesian machine learning guy already read SICP and is curious about Clojure in general.

2017-10-17T21:58:44.000299Z

Then show them something better! I agree that's the best strategy.

whilo 2017-10-17T21:58:58.000166Z

But I couldn't recommend to do the kind of things he does in Clojure.

2017-10-17T22:00:10.000390Z

However, in such cases I always remember this (alleged) quote of Henry Ford: "If I'd asked people what they need, they'd say a better horse cart."

whilo 2017-10-17T22:00:17.000196Z

The production kind of arguments against the researcher attitude also do not necessarily help, I think. Clojure might be better at deployment, but this often sounds as if researchers are no real programmers.

whilo 2017-10-17T22:01:03.000195Z

They might not be that good system engineers, but this is really not the way to win people over. These kind of real-world arguments.

whilo 2017-10-17T22:01:18.000358Z

Providing low-level libraries and compositions on top of them is.

2017-10-17T22:01:44.000162Z

I understand, and agree, but as I've said, I don't see how I could do anything about that.

2017-10-17T22:01:56.000092Z

Yes

whilo 2017-10-17T22:02:04.000090Z

Sure, I just needed something experienced to reason with 🙂

2017-10-17T22:02:28.000301Z

someone

2017-10-17T22:02:35.000051Z

🙂

whilo 2017-10-17T22:02:41.000033Z

Oh, it is late 😂

whilo 2017-10-17T22:02:47.000106Z

Sorry

2017-10-17T22:04:42.000362Z

Thankfully, my main goal is not to win over new users for Clojure, but to create tools that I need and like. If some other people find them useful too, that's great, but I won't beat my head over it too much.

whilo 2017-10-17T22:05:00.000081Z

For the one tensor 3d network I used, I need basically a to first do matrix multiplication along one axis and then along the other. So one has to be able to shift this view on the axis, think, without doing stupid things.

whilo 2017-10-17T22:05:31.000156Z

I agree, this is a good approach.

whilo 2017-10-17T22:06:04.000098Z

In the longer run it is helpful if a few people work together though I think. So sharing some common problems to solve seems important to me.

2017-10-17T22:06:18.000083Z

I agree completely

whilo 2017-10-17T22:07:26.000220Z

If I pursue the autograd stuff in Clojure, which is strictly necessary for anything more I will do, I need to get these low-level memory ops right.

whilo 2017-10-17T22:07:52.000218Z

You do not plan to wrap cuDNN, do you?

2017-10-17T22:07:57.000236Z

Yes.

2017-10-17T22:08:27.000281Z

However, not a priority, since I do not need it for any Bayesian stuff, which is my main interest.

2017-10-17T22:08:55.000227Z

So it might be some time before I do it.

2017-10-17T22:09:26.000251Z

So, I do plan to connect to cudnn

2017-10-17T22:09:57.000415Z

And also provide a fast CPU-oriented implementation for tensors

2017-10-17T22:10:48.000046Z

But, realistically, that won't happen in 2017

chrjs 2017-10-17T22:17:26.000157Z

Hey both, Don’t mean to butt in, but thought I’d say that I’m following this conversation with interest and nodding often. I long for tape based autograd in Clojure! I’m glad for the work that people are doing on Cortex, but while it’s essentially NN abstractions only, it’s not that interesting to me personally. To my shame, I haven’t made time to give Neanderthal a proper try yet, though I’m soon rewriting a system where I intend to and I expect the speed gains to be substantial. That thread you pointed out @whilo was very interesting, though I’ll have to re-read in the morning. I’ll definitely take a look at your clj-autograd library for interest too.

2017-10-17T22:20:07.000051Z

@chrjs Welcome, Chris 🙂

chrjs 2017-10-17T22:20:57.000372Z

Also, the progress of Neanderthal recently has been staggering (even if I am not using it, I’ve seen the change logs). Thanks for your output Dragan.

whilo 2017-10-17T22:22:29.000181Z

@chrjs Hi 🙂

chrjs 2017-10-17T22:22:50.000227Z

👋

whilo 2017-10-17T22:23:36.000184Z

I don't know how to exactly take all the parts in the thread apart, I wanted to reply this evening, but there are many arguments somehow interleaved and it is challenging to separate the reasonable ideas from false assumptions.

whilo 2017-10-17T22:24:29.000264Z

I just read the pro tensorflow blog post and it is not a lot better: https://hackernoon.com/ml-framework-design-reliability-composition-flexibility-9314f72d2c73

whilo 2017-10-17T22:25:30.000054Z

@blueberry Is it possible to take a 3d tensor, first apply matrix multiplications in parallel along one axis, then rotate (transpose) the result and multiply matrices from the another axis.

whilo 2017-10-17T22:25:38.000389Z

How problematic is the transpose.

whilo 2017-10-17T22:25:40.000098Z

?

2017-10-17T22:25:55.000273Z

In neanderthal?

whilo 2017-10-17T22:26:08.000051Z

Yes.

2017-10-17T22:26:09.000232Z

Neanderthal currently does not support tensors.

whilo 2017-10-17T22:26:42.000188Z

I know, but I mean with the primitives that BLAS (neanderthal) provides, is this doable efficiently?

whilo 2017-10-17T22:26:57.000195Z

I guess that this changes row major to column major mode at least.

2017-10-17T22:27:05.000025Z

You'd have to "simulate" it with a large matrix and it's submatrices, and I think it is possible, but you'd have to investigate it in detail.

2017-10-17T22:27:22.000002Z

If it is possible to do, it will be efficient.

whilo 2017-10-17T22:27:44.000381Z

Ok.

whilo 2017-10-17T22:28:10.000393Z

The tensorflow post basically just emphasizes Python deployment problems, which are non-Problems on the JVM.

whilo 2017-10-17T22:28:42.000121Z

A jar-file you built today against native interfaces will work in 10 years if you can still get a compatible low-level library.

2017-10-17T22:29:06.000178Z

Transpose in neanderthal is O(1), so if you just change col/row without actually reordering elements in memory, it would not require any copying.

whilo 2017-10-17T22:29:14.000059Z

The latter is independent of the framework. It might be able to abstract it away, so does neanderthal or a compute graph description on top of it.

2017-10-17T22:29:54.000386Z

If you actually have to physically rearrange elements, you'd use (trans!), which is fast but not O(1)

whilo 2017-10-17T22:30:41.000162Z

I guess if you multiply once, transposing beforehand is slower.

whilo 2017-10-17T22:30:44.000418Z

(?)

whilo 2017-10-17T22:31:08.000117Z

Or is it efficient for both row and column-major modes?

2017-10-17T22:31:38.000279Z

It transparently supports both major modes, even mixing them, without penalty or rearrangements.

2017-10-17T22:32:09.000023Z

Which you can test by simply creating a column matrix and a row matrix and multiplying them transparently.

whilo 2017-10-17T22:32:10.000066Z

Python is crazy with respect to serialization. You have to pickle everything, which is subject to the exact versioning of things in your current runtime.

whilo 2017-10-17T22:32:36.000194Z

I see, ok.

whilo 2017-10-17T22:32:38.000071Z

Thanks.

chrjs 2017-10-17T22:35:10.000039Z

Having been there, Python deployment can really be a pain, especially as it compares to Clojure.

chrjs 2017-10-17T22:36:08.000210Z

I’m no great fan of the JVM in general, but not having to set up virtual envs everywhere and ensure consistency is a genuine win, as far as I’m concerned.

chrjs 2017-10-17T22:36:51.000018Z

I agree that Hackernoon article complects python deployment problems with library problems.

chrjs 2017-10-17T22:38:07.000239Z

> If it is possible to do, it will be efficient. That’s a very powerful feature in itself.

whilo 2017-10-17T22:41:32.000311Z

So I could do a matrix A, tensor B, matrix C multiply, A: k x l, B: l x m x n, C: n x o by first taking the m x n block l times and do elementwise matrix multiply and then the resulting l x m x o block and multiply it o times with A to get a k x m x o result. The crucial thing is that the intermediary result needs to be "transposed".

whilo 2017-10-17T22:42:46.000066Z

I know that I could unroll the tensor, but this will yield a block diagonal matrix, which is very sparse.

2017-10-17T22:44:44.000297Z

not directly (yet) since there is no such thing as tensor B in neanderthal. You'd have to decide how you simulate that tensor B. I suppose as a big l x n matrix that you take m submatrices from (without copying). submatrix function allows stride, so I think you'd be also able to take the right hind of submatrices, but I'd have to work the details to see whether this simulates that tensor B.

whilo 2017-10-17T22:45:22.000150Z

Will this big matrix take l x n amount of memory?

whilo 2017-10-17T22:45:59.000192Z

Well I am confused, it has to be larger than l x n, at least l x m x n

2017-10-17T22:46:15.000091Z

If it is dense - yes. But you also have other matrix types in neanderthal. Whether they can simulate that tensor B is something that you have to work out and see.

whilo 2017-10-17T22:46:18.000235Z

You probably mean l x no

whilo 2017-10-17T22:46:27.000157Z

I see.

2017-10-17T22:46:40.000059Z

Yes, l x m x n. My typo.

2017-10-17T22:47:18.000038Z

You'd decide whether it's l x (m n) or (l m) x n

whilo 2017-10-17T22:47:21.000253Z

@chrjs what have you done in python?

whilo 2017-10-17T22:48:07.000227Z

@blueberry right, that should work out for one side fine at least.

2017-10-17T22:48:10.000318Z

But other (sparser) combinations may be possible

whilo 2017-10-17T22:48:34.000351Z

Sparse multiply with dense blocks is probably not as efficient as the dense blocks themselves, right?

2017-10-17T22:48:43.000113Z

But it might also work for the other side, since submatrices can be sparse!

whilo 2017-10-17T22:48:44.000174Z

Sorry to ask you these n00b questions.

2017-10-17T22:49:36.000386Z

The sparser the matrix, the less cache hits, of course. But it might not be that much slower. The best way is to test with the dimensions you have

whilo 2017-10-17T22:49:52.000096Z

Ok

2017-10-17T22:51:36.000126Z

For matrices that fit into cache, the hit might be negligeable...

chrjs 2017-10-17T22:51:52.000139Z

(@whilo, I was writing machine learning systems in Python for a startup for a couple of years)

whilo 2017-10-17T22:51:58.000200Z

Ok, I have to digest that a bit.

whilo 2017-10-17T22:53:02.000119Z

@chrjs cool, what kind of systems?

chrjs 2017-10-17T22:56:52.000211Z

I work for a startup that predicts the box office returns for films. We now moved to Clojure and a simulation based methodology, but by personal interests still lie in (mostly Bayesian/generative models) ML.

chrjs 2017-10-17T22:57:41.000272Z

Lots of things changed with the move to Clojure. I do miss the general data ecosystem from Python, but not much else.

chrjs 2017-10-17T22:58:29.000211Z

Clojure is a much better platform for writing software systems in general I think.

whilo 2017-10-17T22:58:38.000225Z

You mean the scientific computing ecosystem?

chrjs 2017-10-17T22:58:50.000054Z

Yeah.

whilo 2017-10-17T22:59:22.000170Z

Absolutely, Clojure is really difficult to beat atm. I tried other things a few times over the last few years, but they are all seriously a step backward.

whilo 2017-10-17T22:59:38.000007Z

I mean Julia for example.

chrjs 2017-10-17T22:59:39.000173Z

I know, it’s ruined me for other languages.

chrjs 2017-10-17T22:59:47.000088Z

:p

whilo 2017-10-17T23:00:00.000177Z

It is cool to run high-perf tight loops written in optimization code, but not much else.

whilo 2017-10-17T23:00:13.000104Z

Hehe, yes, that is my impression as well.

whilo 2017-10-17T23:00:25.000065Z

My bar for their frustrations is very low.

chrjs 2017-10-17T23:00:53.000231Z

Heh.

whilo 2017-10-17T23:01:26.000078Z

I have not always made friends with this attitude, though 😂

chrjs 2017-10-17T23:01:48.000109Z

I am going to sleep, but I will definitely be around. It seems like the Clojure ML/scientific computing in general scene is approaching a critical mass.

chrjs 2017-10-17T23:02:02.000126Z

Soon we will maybe even be able to call it a community.

whilo 2017-10-17T23:02:46.000071Z

I think it is important to get a few key concepts useable enough for practical purposes and simple enough so they stay composable, then it could actually be interesting.

chrjs 2017-10-17T23:03:10.000264Z

That is the dream!

whilo 2017-10-17T23:03:32.000213Z

Just solving some top-level problem is way too much effort and does not yield reuseability.

chrjs 2017-10-17T23:04:28.000321Z

To my mind, I’d rather have many composable libraries that each do one thing.

whilo 2017-10-17T23:04:55.000002Z

The biggest thing missing for autograd, besides the perf optimizations to not copy memory is convolutions for me. So if cuDNN was available, I should be able to hack some conv2d layer together.

whilo 2017-10-17T23:05:08.000066Z

Yes, me, too.

whilo 2017-10-17T23:05:53.000091Z

But for optimization the interfaces need to be efficient and performance needs to be considered upfront. Every bit that you lose will be painful for many potential users in the long run.

chrjs 2017-10-17T23:06:50.000351Z

So for instance, autograd would not have to be part of a general tensor (or just vectors and matrices, sticking closer to the hardware) library. But I agree, the performance trade offs of composability need to be careful.

whilo 2017-10-17T23:07:00.000073Z

That is the problem, you can establish something that is usable for 50% of people, but that will never work out for 90% of your audience. Usually the latter part is the one that would also contribute and make your library more attractive to more users, because they use it so heavily.

chrjs 2017-10-17T23:08:11.000133Z

I mean, if we really want to attract people from ML to the Clojure ecosystem, we need to win a Kaggle competition with Neanderthal. That should do it.

chrjs 2017-10-17T23:08:41.000127Z

But I think there are already some people who use Clojure but then reach for Python to do scientific computing.

chrjs 2017-10-17T23:09:03.000113Z

That is the real goal. Provide good tools. Those who find them useful will use them.

whilo 2017-10-17T23:09:33.000211Z

Yes, keep people with Clojure, that have to go elsewhere because they must.

2017-10-17T23:10:51.000010Z

Just to point out that Neanderthal is only a vector/matrix/linear algebra library. We might win a Kaggle competition (or fail miserably) with a ML library built on top of neanderthal 😉 I agree that showing actual results is the right way to get attention, not general talk how Clojure is super nice language.

chrjs 2017-10-17T23:11:59.000128Z

I know that, but since we are in the uncomplicate channel, I thought I should mention the library 😉

whilo 2017-10-17T23:12:00.000170Z

Agreed. But even easier is to keep people who now have to move elsewhere, but would like to stay.

2017-10-17T23:12:11.000051Z

🙂

whilo 2017-10-17T23:12:29.000208Z

The question is who are they and why are they leaving.

whilo 2017-10-17T23:13:01.000312Z

deep learning is obviously one field

chrjs 2017-10-17T23:14:23.000031Z

I have some thoughts on that, but I gotta sleep. Talk to y’all soon.

whilo 2017-10-17T23:14:29.000238Z

I think they could build the deployment stuff themselves actually.

whilo 2017-10-17T23:14:42.000183Z

Ok, gn8 @chrjs

whilo 2017-10-17T23:14:46.000222Z

I also should go to bed.

2017-10-17T23:15:00.000157Z

Good night 🙂

whilo 2017-10-17T23:16:53.000134Z

Good night 🙂