@viesti can’t imagine how this Iterable/Iterator regression was unavoidable. Breaking stuff for fun
(and employment)
resisting to say something about Scala in general :)
another thing that I haven't made clear to myself is Spark runtime version vs version linked into the app
?
can app linked with 2.1.0 run in a cluster running 1.5.0 for example
depends on when classes are resolved
Working on that right now
(defmacro ^:private compile-cond [& choices]
(let [x (Object.)
expr
(reduce (fn [_ [test expr]]
(when (eval test) (reduced expr))))
x (partition 2 choices)]
(when (= x expr)
(throw (ex-info "No valid choice." {:form &form})))))
with that you could ship an app oblivious to Spark version as long as powderkeg is not aot compiled
else keg would hardcode the spark used during aot
https://github.com/HCADatalab/powderkeg/commit/8c3d7f27423f1649d14dada96978b949d64506d3
Having to ship powderkeg 2.10 and 2.11 is no fun, neither is asking user to add chill
any idea?
Scala binary compatibility :picard-facepalm:
flambo seems to support only 2.10
http://spark.apache.org/downloads.html says: Note: Starting version 2.0, Spark is built with Scala 2.11 by default. Scala 2.10 users should download the Spark source package and build with Scala 2.10 support.
raaaaah
can we detect scala version at runtime?
flambo seems to have 0.8.0 for spark 2.x and 0.7.2 for spark 1.x
https://github.com/yieldbot/flambo/commit/8edda47f85cbb84f4c798d57c9918ab59235b98b
😄
guessing that thy just dropped with 0.7.2 🙂
hmm
(I’m thinking about shading chill twice and using the right one)
hmm, is it even possible to load chill conditionally?
user=> (import 'scala.util.Properties)
scala.util.Properties
user=> (scala.util.Properties/versionString)
"version 2.11.8”
found from http://www.scala-lang.org/old/node/7532
@cgrand it seems to be the way to detect Scala runtime version: http://stackoverflow.com/a/6968014
on current powderkeg:
user=> (scala.util.Properties/versionString)
"version 2.10.4”
in fact chill is a dep of spark itslef so I can remove it
ah, neat, was already thinking of classloader magic http://stackoverflow.com/questions/11759414/java-how-to-load-different-versions-of-the-same-class
this would be quite neat actually, to as a user be able to select spark version
going to make a snack for the kids now
I’m 1h ahead so lunch is long due 🙂
apropo, saw this related to DataSet/DataFrame http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-datasets
yeah that’s what I used
have to learn to read more carefully 🙂
I’m not happy with the fact that I lose schema metadata too often and can’t always reconstruct a spec for the resulting dataset
yup, but seems promising for taking over DataFrames/Dataset/MLlib 🙂
I have quickly looked at Travis CI documentation and it can spawn containers
haven't used Travis myself, but have heard good things about it
https://docs.travis-ci.com/user/docker/ and https://circleci.com/docs/1.0/docker/ look similar at start :) (enabling docker service)
Circleci autodetects clojure projects and runs lein test, travis might do same
is it better to run tests against a container than in local mode?
answering to myself, could test 1.x and 2.x Cluster that way
yes it’s definitely better because local mode share the VM and most classloaders so it hides bugs
PoCing transducers on spark took me one day (never touched spark before) in local mode
Everything else was figuring out how to have it run on a cluster.
yup
hmm so we could use this https://github.com/gettyimages/docker-spark
@powderkeg I merged spark2 and spark1.5 code in https://github.com/HCADatalab/powderkeg/tree/spark2, I had local networking issues today which prevented me from testing. Please try on your own
I’m getting the following on the spark2 branch, with Spark 2.1.0 running:
CompilerException java.lang.ClassNotFoundException: com.twitter.chill.java.RegexSerializer, compiling:(carbonite/serializer.clj:1:1)
And a bit worse after a lein clean
lein with-profile +spark2 repl
oops
yeah, that helps 😄
Now I get this error, further along:
Ok it looks like I botched the macro…
@cgore indeed https://github.com/HCADatalab/powderkeg/commit/ae0998755abd25a2362a5e08709d2d209beaee58
That looks like it’s working for me now.