morning
@mccraigmccraig: have you recovered from your side-effects yet? Or are you still working on it?
@thomas recovered now, thankfully. a badly validated mesos/kafka config was causing brokers to lose all their state occasionally
mccraigmccraig: as we're going to be doing similar I'd love to know what broke/how you fixed it/how you knew it was wrong
@otfrom: in the end it came down to "don't assume that kafka is configured correctly just because it is working now"
the kafka mesos framework seems to always mount the mesos persistent volume you give it with the name "kafka", ignoring the mount-point which can be specified when configuring the volume... and since i set the kafka log directory to be under where i thought the volume mount point was, rather than where it actually was, the broker logs were on ephemeral storage, and the state was lost when the whole cluster was restarted
i knew something was wrong because my onyx tasks became solidly wedged, requiring new kafka consumer-ids... it took me a while to trace it to source, since kafka was restarting and was working fine when i came to it - i thought it was an onyx problem at first
mccraigmccraig: how are you finding onyx on mesos with kafka? I've been looking at it but told my team I'm not allowed to use it yet
@otfrom: onyx has been great - straightforward to get going with thanks to the templates, generally very solid, and michael and lucas are very responsive to questions and problems. running it on mesos is a breeze, since everything is coordinated through ZK
... mesos is an old friend, and gives me very little trouble... this is the first time i've done kafka on it, and there's been some learning about mesos persistent volumes - but aside from my misunderstandings the mesos/kafka framework seems solid and mostly just works
mccraigmccraig: didn't you have persistent volumes before when you were doing c* or es?
no - i didn't deploy c* and es on mesos then, just my app server and webserver components - i don't think persistent volumes were even around then
I vaguely remembered something about using marathon for pinning things to instances and ha proxy to get everything talking
yeah, i've added 'slaveN' attributes to each slave instance, and used marathon's attribute constraints to pin brokers to particular instances
and the new shiny for getting everything talking with haproxy is https://github.com/mesosphere/marathon-lb though i'm still using the older simpler config script
either way, those things read app details from the marathon API and configure haproxy forwarding so you can generally use localhost:APP_PORT to get to an application, which makes app configuration really easy
v cool
the persistent-volume stuff on mesos is still a bit edgy ... it works fine, and the underlying mechanism is dead simple and unixy (and thus trustworthy)... but if you have to configure your persistent reservations and tie your processes to instances then you aren't getting a lot of the benefit of marathon/mesos - that processes can move seamlessly around the cluster when necessary
the answer to this is that frameworks should reserve their own persistent resources, and this is implemented in mesos now but it's brand new afaics, and frameworks aren't supporting it yet - https://issues.apache.org/jira/browse/MESOS-1554
when the frameworks support it, then the story will be a lot slicker
are you using mesosphere or straight from apache?
debs from http://repos.mesosphere.io/ubuntu
we're mostly avoiding the persistent volume issue by getting things off kafka and into s3 as quickly as possible and just seeing kafka as a buffer
don't have any use cases (yet) where dropping some messages is catastrophic
in the case of a cluster wide outage anyway
i'm kinda the other way round - kafka is one of our systems of record (c* is another)... we have another pubsub system downstream of onyx which is unreliable, but super fast and much better than kafka at dealing with many topics
also @otfrom , talking of shiny, you should be using #C0702A7SB - async FTW ! 😉
I've been keeping an eye on the 3rd clojure REST framework malcolmsparks has been working on (as I remember plugboard and am using liberator atm)
we might have a go with some of the microservices we are doing
@otfrom I take that to mean you have confidence that I'm past my 'second system' phase? (second system effect)
😉
I'll state here and now that it's highly unlikely there'll be a 4th
malcolmsparks: those are bold words. 😄
yada does look cool
not too bold, yada has been a daily obsesssion for over a year now, I need some life back
it's rational @otfrom - liberator wasn't async, and all i/o should be async - somebody had to do yada. i'm not sure if there's anything else which needs to be done to it now
I'll agree with you on that