@whilo I'd need to see at least a hello-world-ish code of those ideas to be able to form an opinion.
@blueberry http://www.robots.ox.ac.uk/~fwood/anglican/examples/viewer/?worksheet=gaussian-posteriors
what high dimensional problems can you tackle?
i would like to embed vision problems, but this seems fairly ambitious
I can see that the example takes 20.000 samples of one-dimensional distribution. That's a fairly trivial problem. See Doing Bayesian Data Analysis book - it shows many practical hierarchical models and many examples go into 50+ dimensions. All the examples compute in a fraction of second with bayadera (the same examples run for minutes with Stan, which does all fancy variational + hamiltonian mcmc etc. in C++ called from R). Anyway, for vision problems, which is perception, I do not see how anything can beat deep neural nets...
Right. GANs should be embeddable in generative models.
You asked for hello world, there are a lot more worksheets.
🙂
But you are right, that bayadera is much more focused on performance. For vision this might be very helpful.
I just want to point out that the work around anglican is fairly broad and innovative in regard to language embedding and composability.
I can totally imagine having a highly optimized bayadera model being part of it. As far as I understand the models are composable at the boundaries in general.
the problem is that those computations are so demanding that performance is THE metric. It doesn't matter if anglican can create more complex models if those models run in days. BTW, can you please do a quick microbenchmark and tell me how long does it take anglican to take those 20.000 samples (do not forget to unroll the sequence with doall)
another thing I do not like with anglican's approach is that it does not support regular clojure functions, but does some sort of model compilation that require the model code to be somehow special (i.e. not clojure)
when I talked about hello world I ment the demonstration of those fancy ideas (variational inference, NN, etc.) and the comparison with some baseline wrt. some metrics
It does allow passing in of normal clojure functions through with-primitive-procedures
Right, I get your performance point.
Btw. what do you think of edward?
I'm not too much into probabilistic programming, I am more interested in probabilistic data analysis, and probabilistic machine learning. these things are based on the same theory, but are not the same.
moreover, I look at practicality.
not so much after pure research of interesting things any more
The problem is not so much whether to do it in anglican or in bayadera, but whether to build on Clojure at all.
I see.
I agree about the programming aspect, although I think it is possible to have fast probabilistic machine learning in such a language and optimize inference with it through "compilation".
I mean, I am after interesting things, but I set a higher bar 🙂
Practically speaking machine learning has not entered the programmers toolbox yet.
it also have to solve real problemt
problems.
most of those research project demonstrate toy problems
From that direction anglican might be much more approachable to embed in some small problems, than going full machine learning.
I agree.
But I am in a group who does heavy vision problems and there it is the opposite.
Probabilistic programming doesn't cut it for these problems.
yep, and I want to create such toolbox
Or more precisely a bayesian approach.
of course, because vision is about perception
and not about logic
I meant the cost to compute uncertainties.
probabilistic techniques might be interesting next layer, that could do some reasoning on the output of the vision layer
Yes, that is what most people do nowadays.
They use some CNN and just use it as a feature extractor.
yep.
(and it works very very well) 🙂
btw. next on my reading list: https://arxiv.org/pdf/1703.04977.pdf
For the GMM the 20000 samples took 100secs on my laptop (unsurprisingly).
MC methods are inefficient in general and not respected much in machine learing (at least from my environment)
This is their newest take: https://arxiv.org/abs/1705.10306
And, that is the Gaussian distribution, which is the easiest distribution to sample from (after the Uniform)
I know. But I don't think bayadera couldn't be described by the anglican syntax and framework. I think the big success of NNs besides initial breakthroughs is mostly due to the fact that modern toolkits allow easy composition and modelling.
Now, Bayadera can take much, much, much, more (I forgot how many) samples from Gaussian in a few dozens milliseconds, and most of this time is the communication with the GPU.
But consider this: 100 secs for the simplest hello world that you could find
how useful is that?
no matter what features are there?
It was the GMM on iris, it was not the worksheet i have sent you. But it is still sample.
simple
Bayadera gives you ordinary clojure functions. Why wouldn't you be able to compose them?
I see. The problem is that I can barely convince anybody to use Clojure. For Bayadera to be attractive it would help if it would be part of a bigger community. Clojure in machine learning is still a very hard sell. I can probably use something on my own, but it will be difficult to attract colleagues. Anglican is not much better in that regard, but the few really nice projects that are out there feel very isolated and fragmented. My colleague would like to go with edward, I guess, since it is sponsored by OpenAI and built on top of tensorflow (although he doesn't like tensorflow in particular).
What I like about Anglican is that I can see people using it for small data problems and inference. If this is possible with Bayadera as well, I am totally fine with it.
With people I mean everyday Clojure developers without a GPU and a background in data science.
This would allow to grow a community.
That's why I don't like to bump people to use my (or other) tools. I'm OK with competition using inferior tools 🙂 OTOH, the best way to convince people to use some thechnology is to build useful stuff with it. When they see the result of your work, they'll ask you to tell them what you did (provided that you did something awesome, of course).
Now, it is difficult to convince people to use Clojure for ML, when there are lots of pies-in-the-sky talk, but Clojure tools like Incanter are a joke.
I think that the GPU is essential for ML
For most methods, at least
And the theory has to be learned to some degree
Hmm, yes. You are right about proving with results and the GPU. Anglican can leverage the GPU by embedding LSTMs or VAEs for its proposal distribution, just to point out.
Whoever hopes that they will be able to do ML with the level of knowledge required for Web apps and no maths, will spent years circling around punching other people's tutorials
I'd like to see some benchmarks
BTW Bayadera IS useful for small data problems, and I doubt it is useful for big data problems. That goes to Bayesian methods in general.
But, small data usually means big computation
and Bayadera is all about that 🙂
https://arxiv.org/pdf/1705.10306.pdf Section 5.4 has a 4096 dimensional sampling space.
But no statement over training time or inference.
Just model quality.
that's the problem with most papers. they count number of steps, without regard how much one step costs wrt computation, memory access, and parallelization
I agree that you need maths. But doing maths before seeing what you can do with machine learning can turn many people off. Esp. when you do probabilistic modelling, you need to do a lot of math, much more than for NNs.
Yes, right.
But I think combining MC samplers with NNs that way might be a very good idea.
Inference will become a lot cheaper once the NN is trained (they coin it "compiled").
I don't think so - NNs also require maths to understand what you do, it's only that there are popular frameworks with ready made recipes, that work well for handful of problems (mostly vision and nlp). however, what if you do not work with vision and nlp?
What do you plan to show off with Bayadera? 😛
Yes, I like Bayesian approaches and generative models.
I think they generalize a lot better.
NNs just need you to understand gradients.
No statistics.
Well, I already have (for more than a year) large speedup over Stan, but I am not even eager to publish that. I plan to work on some commercial applications, so I do not even want to show off the technology itself, but the product.
And you are right, that is also my problem. As long as the toolboxes for black-box bayesian inference are complicated, they will never be as popular.
I understand. Might be a good plan.
The biggest "doh" is that bayesian stuff isn't even that complicated, especially when you use the tools that someone other made.
Why have you presented at bobkonf then?
Stan is used by social scientists and biologists
Yes, that is why I like the blackbox approaches where you can specify a generative process and then have reasonable inference per default.
I think machine learning should stop being a niche thing for highly skilled specialists.
The problem with programmers is that sometimes they... well, almost like physicists. But, I do not even care. I open-sourced the technology because at least some people can "get it" (and some really do get it and use it) and I might get valuable feedback and even some contributions. However, I do not want to beg anyone to use it and why would I?
I have been motivated by LDA 4 years ago, it was a very intuitive generative model and easy to understand even without heavy math knowledge back then.
Did it solve practical problems?
You are right. I am thankful that you take the time to argue with me.
LDA?
Definitely
You probably know it, it is known as topic modelling to the industry.
I don't use that. It is used in NLP?
Yes
I don't do NLP, that's the thing 🙂
You can get the topics generating a corpus of texts and the distribution of topics per document in an unsupervised fashion from it.
It is really handy.
(even if just two browse your local paper collection 🙂 )
Did you have any application/business ideas around that or more like research curiosity?
It was at university.
I definitely has business value and is heavily marketed already.
The original paper is from Blei et al. 2003
Something like topic modelling for vision would be nice.
Are you still looking for the thesis topic, or you are set with something?
I am trying to be set. It is frustrating. I would like to at least work probablistically and not just throw NNs at some thing.
So, no.
I was with my supervisor today.
They are very focused on vision.
I don't want to just do some topic they throw at me. I like to be motivated by myself, but this is not working well this time.
Did you see the (old-ish) book by Bishop called Pattern recognition and machine learning?
It's from the pre-DL era (2006)
Yes, I worked through parts of it.
Mostly first 100 pages, EM and variational inference.
but the book is probabilistically-oriented, and he discusses the probabilistic perspective of NNs
and similarity to bayesian nets
but the problem is that DL is so successfull with vision and perception in general that it is a slim chance that you'll get something with bayesian methods
Yes.
It is also not clear what the uncertainty buys you.
they are simply the hammer for another kind of nails
It definitely costs computation.
it does not buy you anything if you have enough data
but in vision you usually have lots of data 🙂
Well, the paper I have first posted is from a guy (Yarin Gal), who showed that droput regularization turns the neural net into a bayesian neural net (or a GP).
maybe bayesian techniques could be useful when you have extremely scarce visual information
but is there such domain?
So you can use the uncertainty on the output by running the inference several times in a meaningful way.
what is your prior there
?
An example from Anglican was to have generative model for captchas and theirs was state of the art, cracking all of them.
Your prior is a GP prior. That corresponds to a normal distribution over the weight matrices.
wait, wait. I meant, you, as a human, set some prior that describe your current (pre-data) knowledge. How do you decide on that?
That article looks interesting
I put it in my bookmarks
It depends on whether you parametrize the normal distribution again.
Hope I'll have time to go through it more attentively in several months 🙂
Hehe
Ah, but why does it have to be the Normal? 🙂
There are many kinds of random processes, and many kinds of distributions 🙂
I understand that in vision, Gaussian might be the thing.
But generally, let's say I am trying to estimate when people call the call center, or something like that
Or the risk of giving out loans
You pick a kernel function to calculate the covariance matrix of the gaussian K
and then the prior is GP(·|0, K)
Or some general risk -> Bayesian methods are really good match for measuring risk
Hmm, no that makes no sense
?
The prior knowledge flows into the kernel function.
for the covariance matrix
from that you generate a function that has some prior set on the uncertainty of the measurement.
All distributions are Gaussian, hence the whole thing is Gaussian again.
But the kernel function can do arbitrariy complicated things, it can be a NN.
Note that the model is pretty sure about the parameters, and they are not very probabilistic 🙂
anyway
I get that, but, given an unknown problem,
I know. I don't like kernel methods very much.
how do you decide it is a good fit to be described by Gaussian likelihood,
and how do you transfer your prior knowledge to the parameters?
Yep, many "bayesian" methods are not that much bayesian
which doesn't mean they are not useful
There are deep GPs now btw. Where the joint probability is not Gaussian any more.
That is what the blog post talks about as well.
Anyway, this doesn't help you in choosing the topic 🙂
What outcome is expected of you?
A number of published papers, or something else?
Does it have to be a part of a narrow EU/DE/industry-backed project or you are more free in choosing the area as long as it is vision?
Solving a good vision problem, something practical. Not just doing maths or playing around.
Although I want to improve my math skill still.
I think targeting a good paper as a result would be reasonable.
But not required.
Solving to be the best in the world or just good enough?
Really? The paper is not required? You have it easy 🙂
I think good enough would be ok.
I am not sure. I want the paper anyway.
Good enough is ok? Even better 🙂
Why worry then?
It would be good if we can show that bayesian modelling can work for vision problems, atm. weakly supervised problems are interesting.
(to the group)
Do you have some problem where data is really, really poor and DL does not work well?
I would like to do something good and know that I can work well with the methods.
That's a Herculean task...
I am also thinking about doing a PhD.
DL (and most of ML) fields seems to me as many soothsayers throwing bones around and reading star constellations
Hehe
You are not already on that path?
Are you in MSc?
Yes.
This is my master thesis.
Ah, in that case, why are you talking such a heavy task?
Isn't it more appropriate for that level to take something that has been researched well and make a good implementation?
Good question. Maybe I have stayed too long in the math department and have complexes now.
That's why I asked you about the paper.
I have done this in my bachelor thesis already.
What you are talking here is too much for master's thesis.
Especially wrt time.
Right.
Well, I mostly see paper's and thesis from state of the art people in the field and they are really good.
But these people are doing post-doc research full time.
And often have worked on those problems for years or even decades.
DL people were doing that thing since 89s
80s
when it was not cool
and not very useful 🙂
Right.
So you can try to do the same for master's if you have 10 years for that 🙂
I don't. I need to break the problem down and do something focused.
but it is better to do some cool exploratory implementation as an introduction to phd
or that is what I'd advise my students 🙂
although they are all working programmers, since in Serbia there is little opportunity to fund students to work on research so they are all working full-time in industry cranking code, and do their studies part-time
That is a reasonable advice.
That is tough.
yep
I have worked part-time with Clojure over the last half year, but I was lucky to do so with friends.
It payed not very well, but it was ok.
cool. what are you making?
it was an app prototype for a client who wants something like yelp.
although i think he doesn't know exactly what he wants.
but we could do datomic with fulltext-search and google places integration in the backend and two single page apps and an android and ios app.
well, at least you polished your Clojure and were paid for that 🙂
it is not finished yet, but clojure was reasonable.
yes.
we also stayed in our deadlines, which was good to see.
i need to take the dog for a walk
enjoy 🙂
really nice talking to you, hopefully we can continue this soon
can bayadera run without a GPU atm.=?
i just have this laptop right now
no.
ok
no problem
i can get a gpu
atm it requires AMD GPU that supports OpenCL 2.0, but I'm also probably giving it CUDA backend soon-ish
and MAYBE a CPU backend
But that depends on inspiration 🙂
no pressure. has nvidia reasonable opencl support?
no. 1.2
pfff
i follow linus' comments... 😉
although my card, R9 290X, which is quite a beast, probably costs something like 100 EUR in Germany now (second-hand)
Because it's couple of generations old, but was top of the line 3 years ago.
So it might be a modest investments
While my Nvidia 1080 cost around 1000 EUR here less than a year ago
and is only 30% faster
Hmm, interesting.
Yes, nvidia was always fairly expensive.
At least when I last had a look at perf 5 years ago or so.
Nvidia's main strength is its suite of hand-tuned libraries, cuBLAS, cuDNN etc.
Have you integrated NNs (e.g. pretrained) in your pipeline yet?
Yes, I think so too.
no. I'm not that into nns
Everybody's doing that, and they are pretty optimized the field
I see. There are tons of pretrained models out there though, which could be helpful for industry applications.
E.g. pretrained CNNs.
I want to work in an uncluttered niche
I understand that very well.
Well, this should be mostly plumbing, nothing critical. pretrained models can be applied to data separately anyway.
What do you use to store tensors?
I used hdf5 in my bachelor thesis.
Clojure support is a bit weak.
I had to tweak the library for multiple dims.
Nothing now. I don't need them for bayesian stuff (yet).
When I need them, I'll add them to neanderthal...
With a Intel-based native backend, and cuDNN GPU backend probably
Ok
as for storing, I leave that to the users for now. They can implement their own transfer methods to/from storage
Yes, I wouldn't pack that into the libraries.
your dog's getting impatient 🙂
It really sucks that all people in python use numpy pickeling.
That is right 🙂
Cu soon
bye