powderkeg

cgrand 2017-03-30T10:21:00.849031Z

I just returned from the Spark codebase and I’m more hopeful than before about Encoders

viesti 2017-03-30T10:21:24.852845Z

so if I don't have time with my sql branch you can take over relevant code

viesti 2017-03-30T10:21:47.856331Z

that is neat :)

viesti 2017-03-30T10:22:04.858906Z

encoder relief :)

cgrand 2017-03-30T10:25:54.895441Z

But now we need to provide two arguments `serializer: Seq[Expression], deserializer: Expression,`

cgrand 2017-03-30T10:27:50.913693Z

Expressions will be built using https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala which provides StaticInvoke, ùInvoke`, and NewInstance.

cgrand 2017-03-30T10:33:11.963245Z

You have to understand that these expressions are just builders for Java code source.

cgrand 2017-03-30T10:33:21.964875Z

(that will be compiled on the fly)

cgrand 2017-03-30T10:33:56.970261Z

So one has to think like when one does Clojure interop from Java

cgrand 2017-03-30T12:05:19.781746Z

I think we can achieve an API like (df src spec & xforms-then-options) very close to what we have for rdd

viesti 2017-03-30T12:06:07.789239Z

that would be awesome

cgrand 2017-03-30T12:06:43.795115Z

the snippet above is totally untested (well it compiles and produces an expression)

viesti 2017-03-30T19:05:27.599463Z

while learning about CollReduce, what about parallel fold?

viesti 2017-03-30T19:06:42.619636Z

on todo list for rdd I guess: https://github.com/HCADatalab/powderkeg/blob/master/src/main/clojure/powderkeg/core.clj#L662

viesti 2017-03-30T19:09:30.664255Z

not managing a PR tonight, maybe later

cgrand 2017-03-30T19:32:21.020684Z

This code is there from the early times. I don't remember why it got commented out.

viesti 2017-03-30T19:33:02.031351Z

there is a related todo comment

viesti 2017-03-30T19:33:49.043468Z

on another note, where is the book on Clojure collection protocols, would need one :)

cgrand 2017-03-30T19:34:35.055133Z

I do remember. Transducers assume linear traversal. So you have to solve transducers+fold first.

viesti 2017-03-30T19:35:03.062365Z

saw some stackoverflow question about that

viesti 2017-03-30T19:36:00.076700Z

Rich's strange loop talk on transducers mentioned parallellism on one slide, wonder where it is at now :)

cgrand 2017-03-30T19:45:21.213441Z

If I understand Alex's mention of kv, I believe I solved it my way in xforms.

viesti 2017-03-30T19:49:40.278123Z

have to grok that too, but now have to try sleeping :)

cgrand 2017-03-30T19:51:11.301328Z

With by-key I deal with deterministic partitioning. Fold is non-deterministic.