observability

o11y, monitoring, logging, tracing, alerting and higher level discussions
2020-10-21T14:43:51.016600Z

Where do people typically place metrics code? Using something like https://github.com/metrics-clojure/metrics-clojure do you just define a global metrics registry and litter that usage of it throughout your code? That seems wrong but then I don't know how else it could be done.

Ben Sless 2020-10-23T06:23:11.029700Z

I can tell you where and speculate why we do so at my workplace: We time parts of our pipeline. That way it's easy to see if a careless commit messed performance up or if some change managed to improve it. We work at a very large scale so even a few % can amount to up to tens of thousands of $ a month. We have meters for business cases and rules. There is a lot of product and business knowledge in our code, a lot of legacy and not always the best design. It can sometimes be hard to diagnose if a change in one place might break some feature in another. Metrics covering various cases can let you see drops or increases in going through certain features which might indicate something was broken even before clients complain. They can also help you verify you have successfully disabled or sunset a feature before completely removing it. That's a result of lack of understanding, proper structure and proper tests in our code, IMO. We also monitor things which could indicate pathological conditions in certain cases, like http return codes, network errors, slowest partition consumer (Kafka), etc.

lukasz 2020-10-21T14:48:03.018300Z

Not a metrics-clojure user, but all our applications are pretty heavily instrumented and report metrics via statsd protocol. In 99% of cases we have middlewares for HTTP and RabbitMQ producers/consumers so the code is not even aware of any metrics being produced. In that 1% case we instrument code directly when we're tracking a performance issue or trying to figure something out.

lukasz 2020-10-21T14:48:49.019100Z

Basically, the boilerplate code which we need to handle request/publish jobs/consume jobs reports all the metrics and the business logic is not even aware of them

2020-10-21T14:49:57.019300Z

gotcha, thanks!

2020-10-21T17:18:49.021100Z

to expand speculatively on @lukaszkorecki’s answer, in the rare case where you need metrics on something outside a middleware wrapped handling pipeline, it would make sense to make a component for the thing being measured, and wrap it via another component dedicated to metrics (this could use eg. stuartsierra/component or integrant)

lukasz 2020-10-21T17:24:38.021600Z

indeed, that's exactly what we do - our statsd client is a component

2020-10-21T17:27:59.022800Z

on one hand I'm tempted to say "it's too bad a design like component is so ubiquitous, it's like our spring framework", on the other hand, with component on one side and spring on the other, we come out looking pretty good

lukasz 2020-10-21T17:30:06.024900Z

Without going too deep - I never understood the arguments against component, having seriously messed up structure of our early codebases, Component was the best thing we could have adopted to bring sanity to our code. We had shared mutable state all over the place and Component fixed just that. Especially now that you don't necessarily need records things can be very light weight in terms of setup. Maybe I just appreciate the verbosity and no-magic approach.

2020-10-21T17:34:11.025600Z

it's painful to bring into a mature codebase, and it makes bad design painful to implement (to me that's a plus, but not every architect recognizes bad design)

2020-10-21T17:34:40.025900Z

I too appreciate the verbosity and lack of magic

2020-10-21T17:35:48.026900Z

@lukaszkorecki it's similar to immutable data structures - they are amazing if you stick to the constructs that play well with them (and being pushed toward those constructs is one of their benefits), but coming from standard algorithms 101 they are just painful

2020-10-21T17:38:34.027900Z

and if you'd had components from the beginning, those deep call chains might have led you to a cleaner design as you went (speculating of course)

lukasz 2020-10-21T17:40:42.028600Z

https://clojurians.slack.com/archives/C010TGGL02X/p1603301651025600 that is exactly what happened to us - once started using component it became apparent how badly we have implemented things

lukasz 2020-10-21T17:41:44.029500Z

and yeah, we have one service which is not using components and it's on everyone's :poop: list, so we avoid touching it with the plan of just replacing it outright with a new implementation