ldnclj

Find us on #clojure-uk
agile_geek 2015-11-27T07:32:09.000238Z

Anyone used Cascalog on a very recent version of Hadoop? I’m using Hadoop 2.7.1 and my Map-Reduce jobs runs fine on a local hadoop instance for that version but not on the cluster. I get a Class Not Found Exception for

cascading.tap.hadoop.io.MultiInputSplit
. See https://www.refheap.com/112124

agile_geek 2015-11-27T07:36:07.000239Z

Oh yeah, Good Morning everyone

pupeno 2015-11-27T10:01:47.000240Z

Good morning.

xlevus 2015-11-27T10:04:09.000241Z

morning

mccraigmccraig 2015-11-27T10:27:23.000242Z

@agile_geek: does your uberjar contain the cascading classes ?

agile_geek 2015-11-27T10:29:31.000243Z

@mccraigmccraig: I’ve not specifically included them but not sure if they are a transitive dependency of Cascalog and I’m not including hadoop-client (referenced as :provided to allow compilation). Should I be including them?

mccraigmccraig 2015-11-27T10:30:56.000244Z

i'm guessing - i don't have specific experience, but either they will need to be in the uberjar, or you will need to install them on the cluster nodes

mccraigmccraig 2015-11-27T10:31:34.000245Z

if they aren't being transitively included, then you could try adding them to your lein project

mccraigmccraig 2015-11-27T10:32:27.000246Z

that's assuming that they aren't already installed on your cluster, which it looks like they aren't

mccraigmccraig 2015-11-27T10:32:49.000247Z

or perhaps you have some terrible jar-hell problem

mccraigmccraig 2015-11-27T10:33:19.000248Z

i think i like the node approach to dependencies better than the jvm approach

pupeno 2015-11-27T10:40:13.000249Z

What are cascading classes?

mccraigmccraig 2015-11-27T10:46:55.000250Z

@pupeno: cascading is the high-level hadoop interface on which cascalog builds

pupeno 2015-11-27T10:47:06.000251Z

Ah, ok.

pupeno 2015-11-27T10:47:08.000252Z

Thanks.

mccraigmccraig 2015-11-27T10:47:36.000253Z

it lets you model your map-reduce ops as more familiar joins, aggregations etc

agile_geek 2015-11-27T11:00:21.000254Z

@mccraigmccraig: i thought similar myself but reading around the Cascalog docs it is clear about not including any hadoop jars and the examples don’t include anything other than cascalog. I have a feeling it’s something to do with the Hortonworks distribution not having Cascading on it.

mccraigmccraig 2015-11-27T11:03:43.000256Z

that would make sense... i have no idea whether it's even possible to run with cascading in the uberjar - i presume there are some gnarly ClassLoader hierarchies inside hadoop, and components like cascading might have to be installed at a higher-level than the app

agile_geek 2015-11-27T11:13:11.000257Z

I assumed that too

agile_geek 2015-11-27T11:13:35.000258Z

I need to read around how cascading gets installed

agile_geek 2015-11-27T11:14:56.000259Z

I think I’ll start by looking at the transitive dependencies for Cascalog and unpacking my uberjar

mccraigmccraig 2015-11-27T11:31:17.000260Z

agile_geek: lein deps :tree is your friend :simple_smile:

agile_geek 2015-11-27T11:31:27.000261Z

Yep!

agile_geek 2015-11-27T11:32:01.000262Z

Along with piping it’s output to a file so I can search.

agile_geek 2015-11-27T13:34:06.000263Z

@mccraigmccraig: hmm, that class is in the uberjar…as it’s a transitive dep of Cascalog. It’s an older version (2.5.3 instead of 3.0.2) of it but it’s there. I wonder if version is causing an issue.

mccraigmccraig 2015-11-27T13:54:48.000264Z

@agile_geek: so u have a newer version of cascading on hadoop, and an older version in your uberjar ? exclude cascading from your uberjar and pray 🙏

agile_geek 2015-11-27T14:08:19.000265Z

@mccraigmccraig: I’ll try it but I’m a bit confused. The stack trace is that this class is missing which suggests it’s not on the cluster OR in my uberjar. In all the examples of Hadoop-Cascading-Cascalog the Cascading jar needed to be jar’ed up and deployed, which it is - I’ve unpacked my uberjar and it’s there. Admittedly, it’s a slightly older version. I’ve tried excluding the older version of cascading and building on a newer one but I get the same error. I’ll try excluding altogether but can’t see how that can work as the class is definitely missing then!

agile_geek 2015-11-27T14:16:44.000266Z

As suspected excluding the cascading lib altogether means the job fails to even compile (eval) when the cascalog functions try to resolve any references to cascading. Previously it failed when it hit the cluster.

mccraigmccraig 2015-11-27T14:36:54.000267Z

@agile_geek: can you pre-compile your sources, then exclude cascading from the uberjar ? then, if you are lucky and they are api compatible, your .classes will perhaps link to the cascading classes on the hadoop cluster

mccraigmccraig 2015-11-27T14:37:25.000268Z

if that fails, then can i suggest spark on mesos 😉

agile_geek 2015-11-27T14:37:54.000269Z

That’s what I did. AOT all on uberjar but it fails as soon as I submit to hadoop

agile_geek 2015-11-27T14:38:50.000270Z

@mccraigmccraig: Unfortunately it took 5 years for the client to get Hortonworks distro of Hadoop approved! Not sure Spark and Mesos will take less than 10!

mccraigmccraig 2015-11-27T14:39:15.000272Z

so you can't run against an EMR cluster instead of the one you are using ?

agile_geek 2015-11-27T14:39:22.000273Z

Nope

mccraigmccraig 2015-11-27T14:40:14.000274Z

and presumably the hadoop distro you have is deeply frozen and there's no chance of getting anything on to or off of the node classpaths ?

agile_geek 2015-11-27T14:40:34.000275Z

You guessed it!

mccraigmccraig 2015-11-27T14:40:45.000276Z

bugger

agile_geek 2015-11-27T14:40:50.000277Z

This job runs ok locally on same version!

agile_geek 2015-11-27T14:41:18.000278Z

I’m going to give up and write it in Java! Ouch!

mccraigmccraig 2015-11-27T14:41:20.000279Z

you mean same version of hadoop or same version of same hortonworks distro ?

agile_geek 2015-11-27T14:41:32.000280Z

version of Hadoop

mccraigmccraig 2015-11-27T14:42:49.000281Z

@agile_geek: you might have a look at : https://github.com/damballa/parkour/

mccraigmccraig 2015-11-27T14:44:26.000283Z

i've not used it, but it looks interesting as a nice interface to vanilla hadoop

agile_geek 2015-11-27T14:45:15.000284Z

Unfortunately the only reason I got to do this bit in Clojure is I said it would be faster but as I’ve lost 2 days to this problem I think I’ve burnt my ‘goodwill’ and I will be forced back to Java.

mccraigmccraig 2015-11-27T14:47:26.000285Z

ha, i guess the argument that "jar-hell is not peculiar to clojure and can burn any attempt to use just about anything on a fixed platform" won't melt much ice, huh ?

agile_geek 2015-11-27T14:49:31.000286Z

Nope. The ppl I talk to would hear Charlie Brown’s teacher “whah, whah, whah Clojure whah, whah, whah, doesn’t work whah whah…”

mccraigmccraig 2015-11-27T14:51:29.000287Z

i shall not complain. this is the mechanism through which large organisations get their lunch eaten by smaller organisations. without it the world would still be dominated by feudal organisations which have been around for thousands of years. oh, wait...

agile_geek 2015-11-27T14:52:35.000288Z

:simple_smile:

malcolmsparks 2015-11-27T16:16:18.000289Z

:simple_smile:

malcolmsparks 2015-11-27T16:17:11.000290Z

@mccraigmccraig: So that's why Windows exists? I'd never thought about it that way!

thattommyhall 2015-11-27T22:13:57.000291Z

Hello you lovelies

thattommyhall 2015-11-27T22:14:14.000292Z

I've not been hanging here much, but should be less busy in 2016

thattommyhall 2015-11-27T22:14:35.000293Z

anyone going (or submitting) to http://www.clojured.de/ ?