ldnclj

Find us on #clojure-uk
thomas 2016-02-26T09:30:53.000175Z

morning

thomas 2016-02-26T09:31:26.000176Z

@mccraigmccraig: have you recovered from your side-effects yet? Or are you still working on it?

mccraigmccraig 2016-02-26T09:36:20.000177Z

@thomas recovered now, thankfully. a badly validated mesos/kafka config was causing brokers to lose all their state occasionally

2016-02-26T11:21:51.000178Z

mccraigmccraig: as we're going to be doing similar I'd love to know what broke/how you fixed it/how you knew it was wrong

mccraigmccraig 2016-02-26T11:49:58.000179Z

@otfrom: in the end it came down to "don't assume that kafka is configured correctly just because it is working now"

😄 1
mccraigmccraig 2016-02-26T11:50:00.000180Z

the kafka mesos framework seems to always mount the mesos persistent volume you give it with the name "kafka", ignoring the mount-point which can be specified when configuring the volume... and since i set the kafka log directory to be under where i thought the volume mount point was, rather than where it actually was, the broker logs were on ephemeral storage, and the state was lost when the whole cluster was restarted

mccraigmccraig 2016-02-26T11:52:53.000181Z

i knew something was wrong because my onyx tasks became solidly wedged, requiring new kafka consumer-ids... it took me a while to trace it to source, since kafka was restarting and was working fine when i came to it - i thought it was an onyx problem at first

2016-02-26T14:11:45.000182Z

mccraigmccraig: how are you finding onyx on mesos with kafka? I've been looking at it but told my team I'm not allowed to use it yet

mccraigmccraig 2016-02-26T14:29:52.000184Z

@otfrom: onyx has been great - straightforward to get going with thanks to the templates, generally very solid, and michael and lucas are very responsive to questions and problems. running it on mesos is a breeze, since everything is coordinated through ZK

mccraigmccraig 2016-02-26T14:31:19.000185Z

... mesos is an old friend, and gives me very little trouble... this is the first time i've done kafka on it, and there's been some learning about mesos persistent volumes - but aside from my misunderstandings the mesos/kafka framework seems solid and mostly just works

2
2016-02-26T14:35:00.000186Z

mccraigmccraig: didn't you have persistent volumes before when you were doing c* or es?

mccraigmccraig 2016-02-26T14:36:10.000187Z

no - i didn't deploy c* and es on mesos then, just my app server and webserver components - i don't think persistent volumes were even around then

2016-02-26T14:43:21.000188Z

I vaguely remembered something about using marathon for pinning things to instances and ha proxy to get everything talking

mccraigmccraig 2016-02-26T14:49:15.000189Z

yeah, i've added 'slaveN' attributes to each slave instance, and used marathon's attribute constraints to pin brokers to particular instances

mccraigmccraig 2016-02-26T14:49:50.000190Z

and the new shiny for getting everything talking with haproxy is https://github.com/mesosphere/marathon-lb though i'm still using the older simpler config script

mccraigmccraig 2016-02-26T14:51:34.000192Z

either way, those things read app details from the marathon API and configure haproxy forwarding so you can generally use localhost:APP_PORT to get to an application, which makes app configuration really easy

2016-02-26T14:53:53.000193Z

v cool

mccraigmccraig 2016-02-26T14:57:37.000194Z

the persistent-volume stuff on mesos is still a bit edgy ... it works fine, and the underlying mechanism is dead simple and unixy (and thus trustworthy)... but if you have to configure your persistent reservations and tie your processes to instances then you aren't getting a lot of the benefit of marathon/mesos - that processes can move seamlessly around the cluster when necessary

mccraigmccraig 2016-02-26T14:58:36.000195Z

the answer to this is that frameworks should reserve their own persistent resources, and this is implemented in mesos now but it's brand new afaics, and frameworks aren't supporting it yet - https://issues.apache.org/jira/browse/MESOS-1554

mccraigmccraig 2016-02-26T14:59:06.000196Z

when the frameworks support it, then the story will be a lot slicker

2016-02-26T15:00:51.000197Z

are you using mesosphere or straight from apache?

mccraigmccraig 2016-02-26T15:01:35.000198Z

debs from http://repos.mesosphere.io/ubuntu

2016-02-26T15:01:53.000199Z

we're mostly avoiding the persistent volume issue by getting things off kafka and into s3 as quickly as possible and just seeing kafka as a buffer

2016-02-26T15:02:29.000200Z

don't have any use cases (yet) where dropping some messages is catastrophic

2016-02-26T15:02:51.000201Z

in the case of a cluster wide outage anyway

mccraigmccraig 2016-02-26T15:05:02.000202Z

i'm kinda the other way round - kafka is one of our systems of record (c* is another)... we have another pubsub system downstream of onyx which is unreliable, but super fast and much better than kafka at dealing with many topics

mccraigmccraig 2016-02-26T15:06:55.000203Z

also @otfrom , talking of shiny, you should be using #C0702A7SB - async FTW ! 😉

2016-02-26T15:08:32.000204Z

I've been keeping an eye on the 3rd clojure REST framework malcolmsparks has been working on (as I remember plugboard and am using liberator atm)

2016-02-26T15:08:47.000205Z

we might have a go with some of the microservices we are doing

malcolmsparks 2016-02-26T15:22:38.000206Z

@otfrom I take that to mean you have confidence that I'm past my 'second system' phase? (second system effect)

malcolmsparks 2016-02-26T15:22:44.000207Z

😉

malcolmsparks 2016-02-26T15:23:28.000208Z

I'll state here and now that it's highly unlikely there'll be a 4th

2016-02-26T15:24:57.000209Z

malcolmsparks: those are bold words. 😄

2016-02-26T15:25:07.000210Z

yada does look cool

malcolmsparks 2016-02-26T15:25:54.000211Z

not too bold, yada has been a daily obsesssion for over a year now, I need some life back

mccraigmccraig 2016-02-26T15:53:30.000212Z

it's rational @otfrom - liberator wasn't async, and all i/o should be async - somebody had to do yada. i'm not sure if there's anything else which needs to be done to it now

👍 2
2016-02-26T17:25:32.000213Z

I'll agree with you on that