uncomplicate

sophiago 2017-08-28T14:09:38.000271Z

@whilo I work on automatic differentiation. I would not recommend using matrices for it as when working with vector functions they grow exponentially despite being very sparse. I use nested tuples and have made the math parts quite fast, although still have a day or two's worth optimization left I haven't had time for in a while. But that's not really the important part since it all happens at compile time and macros + the HotSpot JIT make that easy. For the most part the quality of an AD library is in the extent to which it can transform source from the host language. Not sure when I'll get to that, but I was lucky to stumble upon some of Timothy Baldridge's code from core.async that transforms ASTs to finite state machines so looks like I'll be modifying that once I can wrap my head around how it works (it borrows some concepts from LLVM that were a bit over my head on first glance).

whilo 2017-08-28T14:12:21.000274Z

@sophiago What do you mean with "grow exponentially"? I am interested in having AD work with typical machine learning optimization problems, e.g. deep neural networks.

whilo 2017-08-28T14:12:30.000401Z

Or matrix factorization problems etc.

whilo 2017-08-28T14:13:08.000398Z

Typically the functions are scalar and have a gradient. Backpropagation in neural networks allows to calculate the gradient efficiently. To my understanding reverse autograd is very similar.

whilo 2017-08-28T14:14:05.000177Z

I have played around with clj-auto-diff

sophiago 2017-08-28T14:14:16.000603Z

If you have a function from a vector space R^m to R^n then you'll end up with a jacobian that's size m*n. Repeat and it quickly becomes untenable.

whilo 2017-08-28T14:15:44.000168Z

Yes

whilo 2017-08-28T14:16:02.000526Z

It is exponential in the dimensions of the output.

whilo 2017-08-28T14:16:35.000234Z

Cool that you work on it.

whilo 2017-08-28T14:16:49.000294Z

What do you implement it for?

sophiago 2017-08-28T14:17:33.000036Z

I haven't used clj-auto-diff, but the library it was ported from is top notch. That said, it's in Scheme so the syntax is much simpler and there's no way it handles a lot of Clojure (although you're probably not interested in weird stuff like taking the derivative of non-local control flow). More significantly, since it doesn't actually use macros to do source transformation it's just fundamentally going to be maybe two orders of magnitude slower than libraries that take that approach.

whilo 2017-08-28T14:19:16.000420Z

Ok, I haven't studied it that closely yet. In the benchmarks I have seen the stalin scheme compiler seemed to be fastest.

whilo 2017-08-28T14:19:37.000225Z

I use pytorch atm. which works really nicely on GPUs

sophiago 2017-08-28T14:22:24.000376Z

It's actually quite confusing...you're probably thinking of Stalingrad, which compiles a Scheme-like language called VLAD with AD primitives. That is currently the best out there and matches the top FORTRAN libraries yet is much more comprehensive. Jeffrey Siskind also wrote the AD package clj-auto-diff is based on in regular R4RS Scheme and the Stalin compiler. I'm pretty certain Stalin is no longer considered a particularly fast compiler now that Chez is open source.

whilo 2017-08-28T14:24:18.000161Z

Ok, cool. Do you have a strong scheme background? I have done a bit of SICP in it, but am not that familiar with it. I contacted Jeffrey Siskind to relicence r6rs-ad so clj-auto-diff does not violate the GPL anymore.

whilo 2017-08-28T14:25:08.000429Z

I have talked to @spinningtopsofdoom about a better AD library in Clojure, as Python is not my favourite enviroment for numeric computation (yet atm. without real alternatives for me).

sophiago 2017-08-28T14:28:03.000099Z

Yeah, I was really into Scheme before coming to Clojure. I'm not aware of anything better than the port of Siskind's library at the moment so I would maybe see how it compares to Autograd and possibly hack on it yourself if you need extra functionality and/or performance. I'll post on here when what I'm working on is ready for use, but I wouldn't expect it until around the end of the year.

sophiago 2017-08-28T14:29:29.000114Z

Also, since you're really just interested in backprop I would look into Cortex and ask those folks how they do it. They really know their stuff and would bet whatever method they use is by far your best choice.

whilo 2017-08-28T19:25:30.000199Z

@sophiago I am not just interested in backprop. I actually work with Bayesian statistics where I often need to create custom probability distributions to sample from, this is fairly different to just using a neural network. I read some of cortex code, and it is mostly about high-level model definition in form of layers and then efficient large-scale training. It is not about providing a modelling tool for scientific computing with auto-differentiation. Python has theano, tensorflow and pytorch for these tasks.

whilo 2017-08-28T19:26:44.000038Z

It is more like keras or deeplearning4j, targeted at deep-learning users, not researchers.

whilo 2017-08-28T19:44:08.000379Z

For instance part of backprop is hardcoded here https://github.com/thinktopic/cortex/blob/master/src/cortex/compute/cpu/tensor_math.clj#L1076

whilo 2017-08-28T19:44:49.000060Z

I cannot find any way to build a simple computational graph without using the high-level NN API.