data-science

Data science, data analysis, and machine learning in Clojure https://scicloj.github.io/pages/chat_streams/ for additional discussions
sakalli 2019-10-21T06:34:03.030300Z

Hi there! Here is a blog about some of the stuff re Clojure and data science that @daslu , @konrad.kuehne and myself discussed in Helsinki after ClojuTRE, kindly summarised by Daniel. We would love to hear your thoughts about this! https://scicloj.github.io/posts/2019-10-18-data-wishes/

πŸ‘ 1
chrisn 2019-10-21T13:21:58.031700Z

@neo2551 I have not tried pyodide. I would like to see the jvm implement wasm such that it performed as it should.

chrisn 2019-10-21T13:30:01.032300Z

Maybe we could leverage one of the runtimes listed here: https://github.com/appcypher/awesome-wasm-runtimes https://github.com/fastly/lucet.

chrisn 2019-10-21T13:30:59.033600Z

With Rust it is relatively easy to add a C-layer that we can use via JNA. Lucet may already have that layer in fact as it is designed to be embedded.

chrisn 2019-10-21T17:21:43.034500Z

Two things: 1. New version of https://github.com/techascent/tvm-clj/

chrisn 2019-10-21T17:22:04.035Z

2. TVM now has an IR layer that supports autodifferentiation: https://docs.tvm.ai/dev/relay_intro.html

chrisn 2019-10-21T17:22:29.035400Z

New version of TVM works with latest tech.datatype, tech.ml.dataset, etc.

πŸ’― 1
βž• 1
2
2019-10-21T17:22:51.035600Z

What is TVM?

2019-10-21T17:23:32.035900Z

Never mind :)

chrisn 2019-10-21T17:27:34.036400Z

It is my favorite toy πŸ™‚.

2019-10-21T17:31:49.036800Z

I just want to have pandas in CLJS

2019-10-21T17:32:08.037600Z

I can’t find any library that provides the data alignment functionality

chrisn 2019-10-21T17:32:46.039200Z

Is that the align function you want?

2019-10-21T17:32:51.039400Z

For timeseries: you want to concat two ts, but they might not have the same time stamps. So you would need to make an outer join

2019-10-21T17:33:08.039700Z

Yeah that one.

2019-10-21T17:33:56.040900Z

When you have a sorted index, I can’t solve the problem with an efficient algorithm.

chrisn 2019-10-21T17:34:39.041300Z

Do you mean you 'can' solve the problem with sorted index?

2019-10-21T17:35:32.041800Z

These functionalities I am missing https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html

2019-10-21T17:36:00.042600Z

Well, what I am doing now is I am representing my timeseries as sorted-map

2019-10-21T17:36:17.043200Z

And have date->vector map

2019-10-21T17:36:29.043600Z

I get super fast slicing thanks to subseq

2019-10-21T17:37:18.045100Z

But this representation is inefficient for computation (so I would need to go for date->index and manage my slicing accordingly and keep the data as a matrix).

2019-10-21T17:38:09.046500Z

The biggest trick is whenever I have two timeseries that might potentially have different time index

chrisn 2019-10-21T17:38:29.047400Z

But some of the times align and some don't

2019-10-21T17:38:31.047500Z

Then I would need to merge them (the issue with merge is I can’t do outer join)

2019-10-21T17:38:36.047700Z

Exactly

2019-10-21T17:39:23.048800Z

IMO, the biggest feature of pandas is to solve this problem extremely efficiently.

2019-10-21T17:39:50.049400Z

And all the handling of time as well.

2019-10-21T17:40:10.049900Z

tablesaw does not even care about that.

chrisn 2019-10-21T17:43:20.051500Z

Tablesaw (and tech.ml.dataset) does not have the concept of an index across the table. You could create brand new tables that looked the same as the two tables in the documentation though, they would not share backing store data.

chrisn 2019-10-21T17:44:46.052300Z

Views are doable in tech.ml.dataset but I would first get the functionality and tests working correctly for the 'align' function and then worry about views when someone runs out of RAM.

2019-10-21T17:45:23.052700Z

Agree.

2019-10-21T17:45:49.053500Z

But then I would need it on CLJS xD

2019-10-21T17:46:30.054700Z

I am at the point of thinking to use tensorflow-js for performing my linear algebra operations.

2019-10-21T17:46:52.055300Z

You could rely on WebGL whenever available xD

chrisn 2019-10-21T17:47:40.056200Z

lol, you are better off with tensorflow-js than waiting for me to port the tech platform to js. Why CLJS? Just for kicks?

2019-10-21T17:48:28.057400Z

My company forbid me to use Clojure (to import jar file more precisely) whereas they have now restrictions for JS files

chrisn 2019-10-21T17:48:47.058200Z

haha πŸ™‚

2019-10-21T17:48:48.058300Z

So I code my tools outside the company networks and download the JS files from Github

2019-10-21T17:49:35.059500Z

Plus my velocity in developing UI has been amazing the last year (I am UI newbee)

2019-10-21T17:49:46.059900Z

So they let me play with it xD

2019-10-21T17:50:45.061400Z

Actually, they have been so amazed that we probably are going to stick with ClojureScript for any official web interface

2019-10-21T17:51:15.062300Z

I hope I can hack Clojure in the backend soon. I want to play with Neanderthal xD

2019-10-21T17:52:50.063700Z

I also suspect that tfjs WebGL is faster than most of our internal data science tools (we use Matlab and R)

chrisn 2019-10-21T17:53:52.064400Z

Maybe. The internal stuff should use system blas libraries so it depends on which system blas they have installed. They could have mkl installed or something like that.

2019-10-21T17:55:44.065300Z

Yeah, that would be tricky to beat

2019-10-21T17:56:15.065800Z

That being said I could mimic tech.ml.dataset with tfjs

πŸ‘ 1
2019-10-21T21:53:15.068200Z

You have my attention now :) Guess I have to check Saite now

2019-10-21T21:54:23.069800Z

One of my best shot would still be to force them use GraalVM for the R<->Java interop and hope I can sneak all my Clojure dependencies as well