If I can help with Spark 2 support, ask!
Cool, thanks 🙂
You’re welcome
hello 🙂
great idea for having a channel 👍
what actually was the second example that didn’t work, was it the one with (range 100)
followed by (filter odd?)
and (map inc)
@cgore ?
Yup, that one.
hum, seems to work on my laptop from the repl running against a local spark
huh
how are you running the example?
It was just in a lein repl
is there ay error you could copy&paste? 🙂
hum
I downloaded spark-2.1.0-bin-hadoop2.7 and ran sbin/start-all.sh (using a Mac here)
I’ll have to retry it, maybe it was just some weirdness.
stale repl could be it, I was passing :reload
flag to require
while hacking on powderkeg,core: (require '[powderkeg.core :as keg] :reload)
although that reloads only powderkeg.core, not other namespaces that might have changed
there’s :reload-all
for that
although if you have a running context while your reload, reloaded fns may not work against the ctx and rdd defined before the relaod
tried submitting uberjar, seems to work too on my machine
yep, was thinking on the reloadability aspect, since *sc*
is dynamic
although client code is the that get’s reloaded
*sc*
being dynamic is wishful thinking
not powderkeg itself
🙂
multiple spark contexts from a single vm doesn’t work too well
yup
apropo, was just in a local meetup and met a colleague that is porting Scala app that uses MLlib, ALS specifically
Haven’t looked into MLlib. What would be the challenges?
DataFrame support, though haven't looked at the code that much, have been on parental leave past two months :)
How old is s/he?
I started sketching stuff for data frames but it relies on spec
6 year old and a 1 year 4 month old boys :)
saw that on the Github issue and pointed out him to it
we'r actually using ALS rdd based api, reading from redshift and writing back and to another db
but MLlib itself is moving away from RDD into DataFrame, which I guess have rdd behind the scenes somewhere
don't actually know how big of an issue that is but I guess the sweet spot would be to feed DataFrames from Clojure data structures
http://spark.apache.org/docs/latest/ml-guide.html, "The MLlib RDD-based API is now in maintenance mode."
had another look at https://github.com/HCADatalab/powderkeg/issues/2 and the spec + conform idea feels like a right direction
have to go to sleep now though :)