google-cloud

Google Cloud Platform: Clojure + {GAE, GCE, anything else on Google Platform}
qqq 2017-01-30T01:30:05.000697Z

OR is easy, you just union the two sets, AND ... I guess you could do intersection

qqq 2017-01-30T01:30:52.000698Z

@mobileink: ^ but I think, either way, there needs to be a short way to formulate "I want all objects that satisfy <COMPLICATED CRITERION>", without describing how to search, and that's where datalog/query langauges come in to play

domparry 2017-01-30T12:52:16.000699Z

Found this, and thought I would share… If you’re dabbling with Dataflow: https://github.com/ngrunwald/datasplash

2πŸ‘Œ
domparry 2017-01-30T12:52:21.000701Z

lovely stuff...

2017-01-30T18:31:58.000702Z

we actually use dataflow extensively with clojure

2017-01-30T18:33:06.000703Z

but we have our own fairly minimal wrapper. we're hoping one day to have time to open source it

1βœ…
domparry 2017-01-30T19:24:21.000704Z

that would be great.

domparry 2017-01-30T19:24:58.000705Z

@bfabry If you had an example of a simple pubsub source to bigquery sink, I would be eternally grateful. πŸ˜„

2017-01-30T19:27:31.000706Z

lol, well in our repo that'd be something like (->> (io/read-resource "pubsub-url) (io/write-to-bigquery table-definition)) but that probably doesn't help you much

2017-01-30T20:37:06.000707Z

I will say there's some tension with how dataflow works and how clojure works, so you either end up having to AOT the world or have a few pieces of code that look like this

(defn clj-call-invoke
  [{:keys [full-name params ns-name fn-validation]} &amp; args]
  (try
    (apply (var-get (find-var full-name)) (into (vec args) params))
    (catch Exception _
      (CljDoFnWithContext/synchronizedRequire ns-name)
      (try
        (apply (var-get (find-var full-name)) (into (vec args) params))
        (catch Exception e
          (if (= 'clojure-dataflow.pardo ns-name)
            (throw e)
            (throw (ex-info (str "Exception in " full-name) {:params params :data args} e))))))))

2017-01-30T20:52:55.000708Z

@bfabry out of curiosity, why do you need to aot the world?

2017-01-30T20:54:24.000710Z

because dataflow serializes dofn's and sends them over the wire to be executed. so when the dofn is executed either all of your functions need to be compiled and ready in the jar, or you need a piece of code like this that catches the exception the first time it happens and requires in all your clojure code

2017-01-30T20:54:42.000711Z

we do the latter, because I don't like AOT

2017-01-30T20:58:00.000712Z

sorry, dofn? the reason i ask is because of my experience dealing with servlets. lots of clj servlet stuff aots everything, but you don't need to do that, you can just aot a dinky little clj file and leave the rest to clojure, by using :impl-ns appropriately.

2017-01-30T20:59:03.000713Z

like I said, you don't have to aot everything, you have to aot everything or have some points in your code that detect when the clojure code hasn't been loaded and load it

2017-01-30T20:59:11.000714Z

sorrry i haven't used dataflow but it sounds like a similar situation. wherever there is is a "container" that calls one of your methods.

2017-01-30T21:06:03.000716Z

so i guess my question is, what specific classes do you need to aot compile to make it work, if you do aot compile? e.g. with servlets you must aot compile a gen-class bit, but that's all you need to aot compile.

2017-01-30T21:06:49.000717Z

it's not a particularly similar situation

2017-01-30T21:07:03.000718Z

i mean a gen-class bit that extends HttpServlet. is there some class/interface like that in dataflow?

2017-01-30T21:07:46.000719Z

hmm, ok, i'll do some homework. πŸ˜‰

2017-01-30T21:08:53.000720Z

here's the line in datasplash where they end up having to do what I was talking about https://github.com/ngrunwald/datasplash/blob/master/src/datasplash/core.clj#L87

2017-01-30T21:16:09.000722Z

yikes. i'll have to actually think a little bit to figure that out.

2017-01-30T21:17:30.000723Z

i took a quick look at the dataflow docs and datasplash. it looks to me like the only reason to aot is because the container is looking for -main, no?

2017-01-30T21:17:45.000724Z

no

2017-01-30T21:20:44.000727Z

heh. ok, more thinking. point being only that there is ususlly a midway point between aot everything, and complicated dynamic stuff. you can just aot compile a stubb, and delegate everything to clojure.

2017-01-30T21:21:40.000728Z

sure, and that's what we do, although our "stub" is just written in java rather than aot'd clojure

2017-01-30T21:22:07.000729Z

doesn't get around the catch-and-load need though

2017-01-30T21:24:56.000730Z

interesting. time to learn dataflow.

2017-01-30T22:08:25.000731Z

@bfabry sorry to bother you, but looking at the docs, it seems that a pipeline must have a "main", no? the pipeline runner, as far as i can see, is effectively a container: when you go to run a pipeline, it looks for "main". the docs are somewhat less than clear on this.

2017-01-30T22:09:47.000732Z

a pipeline is an abstract description of the graph of operations you want performed upon a set of pcollections, it does not have a main function

2017-01-30T22:10:59.000734Z

ok, but then how does it get started? there must be an entry point, no?

2017-01-30T22:11:22.000735Z

we execute it by submitting it to the dataflow service via api

2017-01-30T22:11:33.000736Z

you can also run it locally using DirectPipelineRunner

2017-01-30T22:12:33.000737Z

ok, but i still do not see how this can work if we do not have a way of saying, in effect, "start here".

2017-01-30T22:13:23.000738Z

the runner must know what to do with the submitted pipeline.

2017-01-30T22:14:58.000739Z

i don't see how that can happen without a convention. if not "main", then there must be something else so the runner can decide where to start.

2017-01-30T22:16:16.000740Z

the runner must call some method, obviously.

2017-01-30T22:20:32.000741Z

it looks to me like a runner is a "container", just like a servlet container, or an AWS Lambda container, or whatever. am i missing sth?

2017-01-30T22:24:12.000742Z

our project certainly has a main function. it just doesn't really have anything to do with dataflow. it's the thing that constructs the execution pipeline and then submits it to the dataflow api. if you would like to learn dataflow I'd recommend you read the dataflow whitepaper, streaming 101 and 102, and dive in to the examples repository

2017-01-30T22:32:57.000743Z

will take a look, but only if you will take a look at JCL. πŸ˜‰ Honestly, all this dataflow stuff has been around since about 1968.

2017-01-30T22:33:26.000744Z

we used to do it in cobol!

qqq 2017-01-30T22:34:11.000745Z

@mobileink: did you go dinosaur hunting too? πŸ™‚

2017-01-30T22:34:28.000746Z

anyway, just a guess, the only thing you need to aot is the main fn.

2017-01-30T22:35:18.000747Z

@qqq i was raised on a dinosaur farm.

2017-01-30T22:36:26.000748Z

if you need a trexx, lemme know, i'll talk to some people who know some people.

1πŸ™‚
2017-01-30T22:38:32.000749Z

there were no distributed data processing frameworks in 1968, let alone a unified batch and streaming one delivered as a managed service. like I said, you don't need to aot anything if you catch the exception and deal with code load manually, but you need to deal with code load somehow, because dataflow has a very different execution model

2017-01-30T22:44:54.000750Z

@bfabry: did i say 68? ok, 78. ;). my 1st programming job (i was a liberal arts major) was with EDS, and they took us all to cobol camp where we learned to do a "master file update", pure batch processing using JCL to control the "pipeline", and looking at dataflow, it's essentially the same thing, just a little fancier.

1
2017-01-30T22:45:38.000751Z

what goes 'round comes 'round!

2017-01-30T22:47:06.000753Z

the really cool thing is that boot is essentially JCL, done right.

2017-01-30T23:03:40.000757Z

@bfabry serious question: if you've written your code in 100% clojure, and you do not aot anything, then how can the pipeline runner start your pipeline? it would have to know how to talk clojure, which seems unlikely, but even if it did know that it would still have to know where to start. can you explain how to run a pipeline with no aot at all?

2017-01-30T23:05:29.000762Z

which is not related to how it is started. our start function, the main function, is defined in a namespace which we pass to clojure.main using the -m option

2017-01-30T23:05:47.000763Z

because clojure.main is aot'd and comes bundled in the clojure jar

2017-01-30T23:08:55.000764Z

so, you always have sth aot compiled, even if you do not aot compile everything. is that correct.

2017-01-30T23:09:11.000765Z

sth?

2017-01-30T23:09:39.000766Z

something. dictionary geek here, sorry.

2017-01-30T23:10:49.000767Z

the java classes which call into clojure are compiled yes, and clojure.main (which comes compiled with clojure)

2017-01-30T23:11:12.000768Z

we could translate those java classes into clojure gen-class trickery and aot those if we wanted to as well

2017-01-30T23:12:19.000769Z

fwiw, not beating you up. i have what i think is a nice technique for dealing with this sort of situation, where a "container" kicks things off, thus requiring byte-code-on-disk.

2017-01-30T23:12:40.000770Z

which may apply here.

2017-01-30T23:14:04.000771Z

see the docs at https://github.com/migae/boot-ask/blob/master/README.adoc. still a WIP, but you'll get the odea.

2017-01-30T23:16:12.000775Z

in short: you can get the aot stuff but completely hide it, since it is always the same.

2017-01-30T23:21:48.000776Z

i.e. you can get rid of the java stuff, and in fact you cn get rid of the gen-class stuff too.

2017-01-30T23:22:04.000777Z

I really doubt it

2017-01-30T23:22:22.000778Z

if you use boot. πŸ˜‰

2017-01-30T23:23:48.000779Z

heh, ok now i have to prove it. might take few weeks, other stuff to do too.

2017-01-30T23:24:36.000781Z

please don't go out of your way. I'm perfectly happy having two 10 line long java classes. it doesn't concern me at all

2017-01-30T23:25:24.000782Z

oh, it's the challenge, makes it fun!

2017-01-30T23:25:48.000783Z

it's on my list anyway.

2017-01-30T23:32:51.000784Z

i don't suppose you could make the java bit publicly inspectable?

2017-01-30T23:40:34.000785Z

eh, sure I can give you the two most important ones

2017-01-30T23:43:59.000788Z

thanks! will be in touch in a couple weeks.