powderkeg

cgrand 2017-03-07T11:08:18.000006Z

If I can help with Spark 2 support, ask!

cgore 2017-03-07T15:52:20.000002Z

Cool, thanks 🙂

cgrand 2017-03-07T16:24:46.000003Z

You’re welcome

viesti 2017-03-07T19:58:49.000002Z

hello 🙂

viesti 2017-03-07T20:02:19.000003Z

great idea for having a channel 👍

viesti 2017-03-07T20:08:09.000004Z

what actually was the second example that didn’t work, was it the one with (range 100) followed by (filter odd?) and (map inc) @cgore ?

cgore 2017-03-07T20:08:30.000005Z

Yup, that one.

viesti 2017-03-07T20:08:52.000006Z

hum, seems to work on my laptop from the repl running against a local spark

cgore 2017-03-07T20:08:58.000007Z

huh

viesti 2017-03-07T20:09:07.000008Z

how are you running the example?

cgore 2017-03-07T20:09:21.000009Z

It was just in a lein repl

viesti 2017-03-07T20:09:28.000010Z

is there ay error you could copy&paste? 🙂

viesti 2017-03-07T20:09:32.000011Z

hum

viesti 2017-03-07T20:10:22.000012Z

I downloaded spark-2.1.0-bin-hadoop2.7 and ran sbin/start-all.sh (using a Mac here)

cgore 2017-03-07T20:10:22.000013Z

I’ll have to retry it, maybe it was just some weirdness.

viesti 2017-03-07T20:12:23.000014Z

stale repl could be it, I was passing :reload flag to require while hacking on powderkeg,core: (require '[powderkeg.core :as keg] :reload)

viesti 2017-03-07T20:13:16.000015Z

although that reloads only powderkeg.core, not other namespaces that might have changed

cgrand 2017-03-07T20:17:29.000016Z

there’s :reload-allfor that

cgrand 2017-03-07T20:18:19.000017Z

although if you have a running context while your reload, reloaded fns may not work against the ctx and rdd defined before the relaod

viesti 2017-03-07T20:20:14.000018Z

tried submitting uberjar, seems to work too on my machine

viesti 2017-03-07T20:20:40.000019Z

yep, was thinking on the reloadability aspect, since *sc* is dynamic

viesti 2017-03-07T20:22:08.000021Z

although client code is the that get’s reloaded

cgrand 2017-03-07T20:22:13.000022Z

*sc* being dynamic is wishful thinking

viesti 2017-03-07T20:22:31.000023Z

not powderkeg itself

viesti 2017-03-07T20:22:36.000024Z

🙂

cgrand 2017-03-07T20:22:59.000025Z

multiple spark contexts from a single vm doesn’t work too well

viesti 2017-03-07T20:32:37.000027Z

yup

viesti 2017-03-07T20:33:34.000028Z

apropo, was just in a local meetup and met a colleague that is porting Scala app that uses MLlib, ALS specifically

cgrand 2017-03-07T20:35:14.000029Z

Haven’t looked into MLlib. What would be the challenges?

viesti 2017-03-07T20:48:17.000031Z

DataFrame support, though haven't looked at the code that much, have been on parental leave past two months :)

cgrand 2017-03-07T20:49:44.000032Z

How old is s/he?

cgrand 2017-03-07T20:51:13.000033Z

I started sketching stuff for data frames but it relies on spec

viesti 2017-03-07T20:55:36.000034Z

6 year old and a 1 year 4 month old boys :)

viesti 2017-03-07T20:56:46.000035Z

saw that on the Github issue and pointed out him to it

viesti 2017-03-07T20:58:58.000036Z

we'r actually using ALS rdd based api, reading from redshift and writing back and to another db

viesti 2017-03-07T20:59:44.000037Z

but MLlib itself is moving away from RDD into DataFrame, which I guess have rdd behind the scenes somewhere

viesti 2017-03-07T21:00:49.000038Z

don't actually know how big of an issue that is but I guess the sweet spot would be to feed DataFrames from Clojure data structures

viesti 2017-03-07T21:19:47.000039Z

http://spark.apache.org/docs/latest/ml-guide.html, "The MLlib RDD-based API is now in maintenance mode."

viesti 2017-03-07T21:38:28.000040Z

had another look at https://github.com/HCADatalab/powderkeg/issues/2 and the spec + conform idea feels like a right direction

viesti 2017-03-07T21:38:49.000041Z

have to go to sleep now though :)