onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
asolovyov 2017-12-19T07:55:53.000010Z

I've made an appender for timbre to send logs directly to ElasticSearch (for Kibana), and now I'm thinking that it would be nice to add various metadata to those logs (for example, which kafka topic/which offset is being processed when this particular log entry is written). But the thing is that timbre does stuff like that through binding, which means I will not be able to wrap all my tasks in that binding. I wonder if you have any ideas where to proceed. 🙂

eelke 2017-12-19T08:33:00.000303Z

Thanks for the response a while back about joins @lucasbradstreet and @michaeldrogalis. On another note, I was wondering about checkpointing. Now each second the window state is stored. If I understand correctly this is done so that the window state is not lost after recovery. However I believe it may not be needed in case of the kafka plugin, given that you keep the offsets correctly in de checkpoints. I am asking about this since the amount of data stored is quite large on s3. Of course we can increase the interval between the checkpoints to decrease this. But maybe if the window state does not need to be stored it is a cleaner way. i guess you need to be able to track the offsets in the windows to do this? I'll stop typing now 😉

michaeldrogalis 2017-12-19T16:05:46.000260Z

@eelke The offsets constitute part of the checkpoint -- the window contents constitute the other part. They can't be separated. The checkpoint needs to happen atomically. There are other ways of handling big window sizes - iterative snapshotting, longer snapshot intervals, etc.