data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
Frederik 2020-02-11T14:46:30.054800Z

With deep-diamond not released yet, what's the easiest to get started with neural net framework in Clojure? I just need something lightweight to train a reasonably shallow CNN with. Cortex seemed to fit the bill, but hasn't been maintained in two years, and unfortunately I can't get one of their basic examples to work: https://github.com/originrose/cortex/blob/master/examples/xor-mlp/src/xor_mlp/core.clj Trying to run

(train-xor)
I just get:
ExceptionInfo Network does not appear to contain a graph; keys should contain :compute-graph  
Any better options? Or some links to more complete cortex documentation so that I can get it to work? Thanks!

2020-02-11T14:48:05.055400Z

Depends on how "lightweight" you need

2020-02-11T14:48:27.055900Z

There are clojure bindings to MXNet - I'll get you an example of CNN

2020-02-11T14:50:17.058Z

There are other examples in that repo too

Frederik 2020-02-11T14:50:17.058100Z

Great thanks! The more easily it is to get started with the better, willing to sacrifice bells and whistles for it. 🙂 E.g. in python I'd choose a basic Keras layers interface over a full Tensorflow Estimator API.

Frederik 2020-02-11T14:50:35.058600Z

Thanks! 🙂 I'll have a look.

2020-02-11T14:50:38.058900Z

You can also do Keras through python interop

2020-02-11T14:51:28.059200Z

Here is MXNet example of CNN with python interop https://github.com/gigasquid/libpython-clj-examples/blob/master/src/gigasquid/mxnet.clj

2020-02-11T14:53:44.060500Z

I don't have a straight up Keras example yet - but I'm sure you could follow the examples and figure it out 🙂

Frederik 2020-02-11T14:54:10.061Z

I was planning to stay away from python interop as I'm using this project to learn Clojure, but will keep it as a backup option 🙂

👍 1
Frederik 2020-02-11T14:54:25.061300Z

Thanks for all the help!

Frederik 2020-02-11T16:03:43.062200Z

Probably quite basic scala question, I don't get the mxnet installation to work, getting the following warning and error when starting a leiningen repl:

INFO  MXNetJVM: Try loading mxnet-scala from native path.
WARN  MXNetJVM: MXNet Scala native library not found in path. Copying native library from the archive. Consider installing the library somewhere in the path (for Windows: PATH, for Linux: LD_LIBRARY_PATH), or specifying by Java cmd option -Djava.library.path=[lib path].
WARN  MXNetJVM: LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64_lin
WARN  MXNetJVM: java.library.path=/opt/intel/mkl/lib/intel64_lin:/usr/java/packages/lib:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
ERROR: Unhandled REPL handler exception processing message {:op stacktrace, :id 352, :session cf4e20e7-57b8-43ce-8980-05728d1df8a3}
I changed the LD_LIBRARY_PATH to get neanderthal to work, which I assume broke the mxnet standard installation (?). I want to add the installation place of the scala library to my library_path, but have no idea where to find it.

2020-02-11T16:26:39.062900Z

you don't need to set the LD_LIBRARY_PATH - the native libs are extracted from the jar and put into a temp directory to load

2020-02-11T16:27:33.063400Z

There are a few options to get going - I would read through this https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package

Frederik 2020-02-12T11:43:24.079100Z

Thanks for all the links and your patience! Played around with it more, starting with the simple XOR problem, I've made a small gist for it: https://gist.github.com/Toekan/0d180f129c3bd3036a041149f87ac85e Tbh, being reasonably new to Clojure and completely new to mxnet, a trimmed down example like this would have helped me getting started. The majority of the examples start with a specialized iterator like the mnist-iterator, hiding the exact needed shape of your training data and labels a bit. It took me several errors e.g. to figure out that I had to wrap my ndarray in a vector before passing it to ndarray-iter. In any case, up and running to attack my CNN problem now. 🙂 Very happy this mxnet port to Clojure exists!

2020-02-12T14:21:50.079700Z

nice example @frederikdieleman! If you are interested in contributing as an example to the repo - it would be welcome 🙂

Frederik 2020-02-12T14:28:36.079900Z

That would be great! 🙂 Anything I need to change? Things that aren't done in the best way, code of formatting that isn't very clojury, more documentation needed, etc? Let me know and I I'll put in a PR in the coming days

2020-02-12T14:35:45.080300Z

It would be nice to add it as a page to the tutorial example here https://github.com/apache/incubator-mxnet/tree/master/contrib/clojure-package/examples/tutorial/src/tutorial

2020-02-12T14:36:52.080500Z

Any comments that you want to add or clarify to help beginners would be great. Other than that, it just needs the apache license at the top in a comment like this https://github.com/apache/incubator-mxnet/blob/master/contrib/clojure-package/examples/tutorial/src/tutorial/module.clj

2020-02-12T14:37:07.080800Z

Just tag me on the PR - thanks!

2020-02-12T14:38:05.081Z

oh there is a test for the tutorial that just loads the namespace - if you could add yours there too, would be great https://github.com/apache/incubator-mxnet/blob/master/contrib/clojure-package/examples/tutorial/test/tutorial/core_test.clj

Frederik 2020-02-12T14:38:57.081300Z

Ok, will tag you when I make a PR 🙂 Thanks!

Frederik 2020-02-13T15:03:47.084300Z

Hi, Hope you don't mind I post another question here, but google doesn't solve my problem. I'm trying to run m/predict on different batch sizes than I trained on. I've been playing around with rebinding the model, but nothing seems to work:

(def X-test (mx-io/ndarray-iter [(ndarray/->ndarray [[0.0 1.0]])]
                                {:label-name "output_label"
                                 :data-batch-size 1}))

; model was trained like in the gist linked earlier.
(def binded-model (m/bind trained-model
                          {:for-training false 
                           :data-shapes (mx-io/provide-data X-test)}))

; Trying to predict
(m/predict binded-model {:eval-data X-test :num-batch -1})
But whatever I seem to try out, I always get something in the likes of:
[14:59:12] src/operator/tensor/./matrix_op-inl.h:697: Check failed: e <= len (4 vs. 1) : slicing with end[0]=4 exceeds limit of input dimension[0]=1
How do I get rid of the expected 4 batch size?

Frederik 2020-02-11T17:04:33.064200Z

Started from a clean leiningen project and now it works, must be interaction with some other dependency info in my project.clj. 🙂 Sorry for all the questions, but trying to get a basic example to work and can't find any example that doesn't load data from disk, while I want to train data I already have in memory. mx-io/ndarray-iter seems the right approach (?) but can't seem to find any documentation and examples on how to use this. Is there any general documentation on setting up small end to end examples?

2020-02-11T17:23:55.064500Z

glad it's working!

2020-02-11T17:24:22.064700Z

NDArrayIter is the right way to go - here is an example with text classification with CNN https://github.com/apache/incubator-mxnet/blob/master/contrib/clojure-package/examples/cnn-text-classification/src/cnn_text_classification/classifier.clj

2020-02-11T17:28:27.065300Z

There are some other examples in the bert sentence classification using it too

jsa-aerial 2020-02-11T19:36:21.066300Z

Major new release of Saite * lib 0.19.15 on clojars * uberjar 0.5.0 : wget http://bioinformatics.bc.edu/~jsa/aerial.aerosaite-0.5.0-standalone.jar Release summary * Added full editor panel support to picture frame elements - Add any number / combination to :left, :right, :up, and/or :down - May be either live or static. Latter is for typical code markdown - Live editors are fully functional with code execution and may also explicitly update any associated frame visualization - Static can be neither focused nor editable - Theming works for all these editors - Add per tab capabile user defined defaults for editor options (sizing, etc) * Added automated code 'starter' inserts for - Text only (empty) frames (for straight markdown and/or editor panels) - CodeMirror editor elements for picture frames - Visualization frames with starting default template and data source - These also include automatic and automated frame (fid), visualization (vid) and editor (eid) ids. * Added bulk static image saves - Will automatically save all images in a document - Saved per tab as session>docname>tabname>vid(s).png - Supports bulk creation by server or client. - Simple fast implementation - no 'headless browsers' or other extras required - New default 'chart' option in config.edn for where to save * Added new example documents: - cm-example-picframes.clj, showing editor support in picture frames - bulk-vis-save.clj, showing bulk visualization creation and saving * Added (fwd) slurping and barfing to strings * Fix several issues with strings in paredit. * Added main editor panel default sizing to config.edn * Added main doc (scroller) area defaul max size for width and height WRT the 'bulk saving' change: It was always possible to create visualization (actually frames of any sort) frames in bulk - either from the server or from the client. What wasn't available was saving all the visualizations in all the frames in all the tabs as one single simple operation. That is what is now available. As noted when previously discussed, this required very little code and no additional required addons or extras like node.js, graalvm, headless browsers or any other such stuff that seems to have come up when discussing saving VG/VGL in bulk. Very clean, simple and fast.

jsa-aerial 2020-02-11T19:58:02.066700Z

Here's a screen shot of an example of the new CodeMirror editor in picture frames capability:

🦜 3
2020-02-11T20:11:29.067100Z

Awesome

chrisn 2020-02-11T20:54:30.067600Z

Very very impressive.

chrisn 2020-02-12T16:54:08.081800Z

You deserve it; hanami and saite are I think one of a kind. I can't think of anything like them; sort of like a science/math exploration IDE.

2020-02-11T22:37:34.071400Z

Anyone knows if there is a way to connect to SSH to nextjournal? Google Colab just release a pro version (10 USD/month) but you get a P100/T4 or TPU at your disposal for playing around. There are some ways to SSH into these machine and use them as VM. For this case I think the price is fairly good. Wondered if Nextjournal has the same deal. My goal is to explore what RL can do and also to strengthen my skills on the process.

2020-02-11T22:39:20.072200Z

@jsa-aerial is Saite/Hanami the topic of your PhD? :)

jsa-aerial 2020-02-11T22:54:42.075800Z

@neo2551 🙂 First, while they have proven quite nice and very useful for various work in the labs, I would not consider either or both in combination worthy of a PhD level thesis. There isn't enough true innovation in them for that. Second, I'm too old for that at this point 😔. Third, when I wasn't, that was in mathematics