uncomplicate

2017-10-21T08:45:28.000063Z

@whilo this question opens a lot of general questions, so I'll try to be brief 🙂 Maybe I've mentioned this earlier; I've seen a lot of cases (in Clojure and elsewhere, but often in Clojure) where people are talking about vectors, matrices, and even tensors, and they mostly mention these in the sense of data structures. You can create this structure or that, you can take this element or that, you can (eventually) copy this part or that or create a loop that changes or rearranges the elements in a particular (often quite arbitrary) way. And that is OK, often needed (or at least convenient) in many (or all) applications. However, I am baffled that I disproportionately rarely see attention given to the operations on such arbitrary structures. And when there is some discussion about (potentially arbitrary) operations, I rarely see the performance of such operations even mentioned as a topic, and I virtually never saw (in Clojure land) a discussion of more subtle things such as numerical stability of what is being talked about. I am baffled because shuffling stuff around is, although not always easy, much, much easier than providing meaningful, performant, correct, and numerically stable operations. And it is especially difficult to make such operations competitive with what is available in more mature environments. That brings us to integer and byte matrices. OK, I create a byte matrix, and then what do I do? Even for such basic operation as addition, I'd have to take care that all elements are small, or it would overflow. Multiplication is almost guaranteed to overflow. It is going to overflow easily even if most elements are as small as 1 if its dimensions are in the range of hundreds. Even the most general functions that you can map over (such as those in the vect-math namespace), usually do not support integers. You might take the exponent of 10, but will the result be integer, and does it fit into one byte? Of course not. There is a reason that BLAS, LAPACK, and other similar APIs do not support integers. Integer matrices are a separate topic with quite limited use, and cannot be treated in the same way as floats. Unless, of course, we only care about vectors in the fashion of Clojure data structures which can hold elements, lets you set and get them, copy them and print them.

2017-10-21T08:45:31.000004Z

Which gets us to the integer backend in Neanderthal. As you've already noticed, there is an implementation of integer vector (you can create it using the usual vctr function and providing an integer or long factory as the first argument, as the convenience iv and lv functions do with mkl-int and mkl-long implementations. I've added those because integer vector is useful in operations that do column or row permutations. I haven't added integer matrices because I didn't have use for them, but as you can see in the implementation, it is a quite straightforward thing to do. Generally, you do not even have to create a separate implementation for each of those. I had to do this, because I provide a lot of primitive operations, and Clojure is a bit inflexible in that part of java interop, but the infrastructure is already there to literally just create one `ObjectGEMatrix` and give it existing LongDataAccessor and IntDataAccessor, or (straightforward to write) ByteDataAccessor, and possibly any MySpecialFormatDataAccessor to have a complete "matrix as a data structure" functionality at your disposal. And you even wouldn't have to type too much, since it would amount to literaly copy/paste the primitive double data RealGEMatrix and search/replace double -> object with a few adjustments. So this argument is IMHO a strawman partly since it is an easy thing to provide, and also since it is not provided in a very featureful way in other libraries that I saw in Clojure/Java land. If I'm wrong, I would like to see how do they compute the most basic operations on those datastructures such as norm and dot product, a bit more involved such as matrix-vector multiplication or a matrix-matrix multiplication, or even some linear algebra specific as for example solving linear systems. The answer might be that they don't because they do not need those and that's totally OK, but then it is not a particularly impressive feature and can be easily replicated in Neanderthal if needed. However, there is an important issue of images. I agree that DL applications are quite important now due to the hype, and that byte, integer, or even binary structures could be exploited there and can be needed. If they are only needed as a data structures, a better approach might be to call them data structures and provide specialized image data structures that are optimized to the required applications. That's exactly what ClojureCL and ClojureCUDA are for. If you need bytes/integers/whatever, and only need a data structure to conveniently shuffle them around, you do not need a heavy machinery of linear algebra to do that. And, especially, calling those structures tensors is misleading if some basic tensor operations such for example are contractions and unfoldings are not provided.

2017-10-21T08:46:15.000009Z

Now, this was one long wall of text. I don't even know why I wrote this on Slack, where it is going to disappear in a few days...