onyx

FYI: alternative Onyx :onyx: chat is at <https://gitter.im/onyx-platform/onyx> ; log can be found at <https://clojurians-log.clojureverse.org/onyx/index.html>
eoliphant 2018-05-16T13:04:00.000353Z

Hi, I have a kind of general architecture, etc question. I’m looking to use onyx for a commander’ish pattern implementation. One of the things I’m working through is the best way to manage/maintain the state that a command processor needs in order to do its thing and issue the appropriate event(s). I’ve done stuff with ES/CQRS frameworks like axon, that have things like an explicit ‘event sourcing repository’, such that my say order command processor, would ask the repo for order 27, and it would return the state as a function of all it’s stored events. I’d been considering using datomic as this ‘aggregate repo’ or whatever, but I initially was having some heartburn as it could possibly violate the principle that the events are the source of record for everything, but now I’m thinking that as long as the datomic state is a function of applying events, can be rebuilt as needed, then it actually is ok, and actually potentially makes for a better ‘repository’ implementation,as it’s essentially an ongoing snapshot, as opposed to other libs/approaches where you maintain the snapshot, and still read N events to get to the current state. Sorry for the ramble lol, but just wanted to see what you folks thought about this

Travis 2018-05-16T13:15:32.000161Z

@eoliphant Been doing a little thinking about this myself. I am currently implementing this with Kafka streams since I don't have onyx available to me in this case. If using onyx and datomic you probably can use it as the state store and have an onyx job read the datomic log for the events

eoliphant 2018-05-16T14:23:29.000104Z

yeah @camechis I’ve been looking at Kafka streams also, and trying to decide how that might fit in, pros/cons, etc. It’s the Tyranny of Good Choices lol

eoliphant 2018-05-16T14:27:15.000517Z

And yeah, pulling stuff from the datomic log is yet another dilemma lol. Because, in that case strictly speaking, the datomic log(s) are the SOR and the event stream/store is derived, so it’s not event ‘sourcing’ per se. I know the NuBank guys did that (microservice datomic log -> kafka) but I saw a talk by their CTO recently in which he indicated that if he had to do it again, he’d have flipped it around

2018-05-16T14:27:17.000670Z

i think datomic could be fine for this, but i don't think it's that good of an event store

2018-05-16T14:27:27.000883Z

it's more of an aggregate store than an event store imho

eoliphant 2018-05-16T14:27:33.000598Z

yeah @lmergen, that’s my take also

eoliphant 2018-05-16T14:28:02.000821Z

for smaller projects I’ve just done reified transactions and tagged them

eoliphant 2018-05-16T14:28:07.000272Z

but this is abigger deal

eoliphant 2018-05-16T14:28:14.000532Z

so i need to break stuff out

2018-05-16T14:28:14.000599Z

@eoinhurrell have you seen the latest project by the onyx guys ? http://pyrostore.io/

eoliphant 2018-05-16T14:28:18.000307Z

yeah

eoliphant 2018-05-16T14:28:25.000418Z

looks pretty cool

2018-05-16T14:28:31.000407Z

it's perfect as an event store

2018-05-16T14:28:51.000350Z

but it depends a bit upon your use case / requirements

2018-05-16T14:29:24.000966Z

you could also just stream things to S3

2018-05-16T14:29:42.000632Z

so then you have s3 next to datomic

2018-05-16T14:29:58.000948Z

at least then you can always easily go back to the raw data

eoliphant 2018-05-16T14:30:05.000558Z

yep, especially with athena, etc

eoliphant 2018-05-16T14:30:09.001209Z

again, too many choices lol

2018-05-16T14:30:13.000100Z

yes

2018-05-16T14:30:29.000920Z

so whenever i face too many choices, i usually opt to keep things really simple

2018-05-16T14:30:38.000324Z

which would be s3 in this case

eoliphant 2018-05-16T14:30:39.000466Z

yeah that’s what I’m trying to get to

eoliphant 2018-05-16T14:30:51.000023Z

for me the main decision point

eoliphant 2018-05-16T14:30:55.000693Z

is what’s authoritative

2018-05-16T14:31:06.000463Z

event store is always authorative

eoliphant 2018-05-16T14:31:07.000434Z

and I’m trying to make that the events

eoliphant 2018-05-16T14:31:39.000205Z

yeah, but in some of these scenarios like datomic log -> event store

eoliphant 2018-05-16T14:32:04.000190Z

it doesn’t well ‘feel right’ lol, as strictly speaking it would be datomic

2018-05-16T14:32:05.000153Z

i would think that's overcomplicating things

eoliphant 2018-05-16T14:32:09.000190Z

yeah exactly

2018-05-16T14:32:15.000445Z

i would do kafka -> s3 and in parallel, kafka -> datomic

2018-05-16T14:32:24.000615Z

s3 for events

2018-05-16T14:32:27.000633Z

datomic for aggregates

2018-05-16T14:32:41.000239Z

you could even use kafka as the event store

eoliphant 2018-05-16T14:32:44.000671Z

yeah I’m going to take another look at s3

eoliphant 2018-05-16T14:32:45.000394Z

yeah

eoliphant 2018-05-16T14:32:53.000946Z

that’s what I’d been planning to do

2018-05-16T14:32:56.000781Z

but it's a terrible event store in my experience

eoliphant 2018-05-16T14:32:59.000258Z

just store forever in kafka

2018-05-16T14:33:12.000220Z

again, it depends upon what you want to do with it

2018-05-16T14:33:26.000104Z

if you only want to use it as backup, it's fine

eoliphant 2018-05-16T14:33:37.000713Z

so what issues have you had with it from the event storage perspective? I’ve seen some rumblings along these lines lol

2018-05-16T14:33:39.000047Z

if you want to allow your data scientists to query the event store directly, it sucks

2018-05-16T14:34:10.000554Z

if you use kafka as the event store, imho it's not a great tool for ad-hoc querying and data exploration

eoliphant 2018-05-16T14:34:15.000885Z

well right, but I thought typically, those guys would build their own views etc

eoliphant 2018-05-16T14:34:22.000021Z

ah but I see what you’re saying

2018-05-16T14:34:22.000733Z

if you put it on s3, you get a ton of extra tools like athena for free

eoliphant 2018-05-16T14:34:29.000827Z

querying ‘into’ the store itself

2018-05-16T14:34:33.000096Z

yes

eoliphant 2018-05-16T14:34:39.000117Z

as opposed to just in order reading into somethign more suitable

2018-05-16T14:34:48.000795Z

also, more tools integrate with s3 than kafka

eoliphant 2018-05-16T14:34:57.000670Z

yeah interestingly

eoliphant 2018-05-16T14:35:03.000561Z

this opens up some other possibilities

eoliphant 2018-05-16T14:35:14.000697Z

i’d been trying to push as much mgmt overhead to aws

eoliphant 2018-05-16T14:35:22.000802Z

so i’d been looking at kinesis etc

2018-05-16T14:35:38.001021Z

i've used kinesis firehose for years, it's solid

eoliphant 2018-05-16T14:35:39.000824Z

but the fact that there’s no ‘store’

eoliphant 2018-05-16T14:35:49.000387Z

was pushing me back to kafka

2018-05-16T14:36:03.000343Z

kinesis firehose can easily stream everything to s3 as well

eoliphant 2018-05-16T14:36:08.000500Z

yeah

2018-05-16T14:36:11.000791Z

so then s3 becomes your store, again

eoliphant 2018-05-16T14:36:36.000637Z

yeah and now i’m having some ideas, which is always dangerous lol

2018-05-16T14:37:00.000590Z

as long as the ideas are good... 🙂

eoliphant 2018-05-16T14:37:04.000919Z

lol

eoliphant 2018-05-16T14:37:19.000882Z

because I can get my in order semantics i guess out of athena

eoliphant 2018-05-16T14:37:39.000549Z

and again, datomic would actually make for an awesome aggregate store

2018-05-16T14:37:52.000617Z

you can make order semantics explicit

eoliphant 2018-05-16T14:38:09.000163Z

none of that (look at the snapshot, then grab the last few events stuff)

2018-05-16T14:38:44.000799Z

did you see this ? https://yuppiechef.github.io/cqrs-server/

2018-05-16T14:38:56.000866Z

the guy never actually implemented it

eoliphant 2018-05-16T14:38:57.000294Z

yeah I’ve played with it acutally

eoliphant 2018-05-16T14:39:04.000464Z

the stuff that was there lol

2018-05-16T14:39:05.000376Z

as in, never ran in production

eoliphant 2018-05-16T14:39:14.000743Z

and to your point

eoliphant 2018-05-16T14:39:19.000275Z

we’ve got some more options now

2018-05-16T14:39:29.000866Z

🙂

eoliphant 2018-05-16T14:40:00.000431Z

so to your point kinesis/kafka could give us the required serialization/ordering

eoliphant 2018-05-16T14:40:17.000105Z

so that stuff shows up in s3 correctly

eoliphant 2018-05-16T14:40:24.000383Z

ah that’s another thing

2018-05-16T14:40:47.000549Z

who decides what the correct ordering is ?

eoliphant 2018-05-16T14:40:50.000078Z

are you in your case just using the ‘put time’ for order?

2018-05-16T14:40:55.000194Z

when you have multiple kafka partitions / brokers

2018-05-16T14:41:06.000617Z

how would you manage ordering ?

eoliphant 2018-05-16T14:41:12.000106Z

right that’s anoher thing i was working through lol

eoliphant 2018-05-16T14:41:32.000537Z

that makes it less than suitable in some scenarios

eoliphant 2018-05-16T14:41:57.000178Z

since this is business stuff as opposed to just streams of data from IOT or somehting

2018-05-16T14:42:02.000821Z

so what i do is not depend upon ordering

2018-05-16T14:42:20.000290Z

i use the onyx epoch id

2018-05-16T14:42:23.000383Z

i tag that

2018-05-16T14:42:27.000397Z

so i can deduplicate

eoliphant 2018-05-16T14:42:28.000876Z

ok yeah

eoliphant 2018-05-16T14:42:37.000798Z

because I have to have some notiion of it

2018-05-16T14:42:42.000470Z

but, for explicit ordering of multiple commands / retries that would conflict

eoliphant 2018-05-16T14:42:54.000002Z

because i’ve got financial transactions, etc going on

2018-05-16T14:43:00.000552Z

i came to the conclusion that the only reliable way to deal with it is eventual consistency

eoliphant 2018-05-16T14:43:28.000287Z

yeah and de-duping/idempotency have to be in the mix as well

2018-05-16T14:43:30.000898Z

if your aggregate processors detect a conflicting operation (e.g. deleting the same user twice), they do conflict resolution at that point

eoliphant 2018-05-16T14:43:40.000151Z

right

eoliphant 2018-05-16T14:44:02.000239Z

well that’s where datomic could be super useful

2018-05-16T14:44:03.000078Z

imho the cqrs / event sourcing pattern demands those kind of conflict resolutions. you cannot achieve strong consistency like a rdbms anymore.

2018-05-16T14:44:09.001046Z

yes that is true

eoliphant 2018-05-16T14:44:19.000361Z

tag the transaction

2018-05-16T14:44:33.000134Z

yes, but that's actually similar to onyx' epoch

eoliphant 2018-05-16T14:44:36.000349Z

right

eoliphant 2018-05-16T14:44:43.000693Z

right so it could be done there as well

2018-05-16T14:45:07.000318Z

i would explore it, because if it's possible, you give yourself more freedom in choice of database

eoliphant 2018-05-16T14:45:15.000177Z

ok

eoliphant 2018-05-16T14:45:22.000532Z

yeah i’m going to give that a whirl

2018-05-16T14:45:30.000537Z

anyway this is all my opinionated advice, take it with a grain of salt 🙂

eoliphant 2018-05-16T14:45:46.000992Z

Yeah I’m dealing with 97 things on this project lol

eoliphant 2018-05-16T14:46:00.000898Z

nah man this is super helpful

eoliphant 2018-05-16T14:46:08.000282Z

so I’m getting this worked out

2018-05-16T14:46:16.000661Z

i learned one thing: one does not simply implement cqrs

eoliphant 2018-05-16T14:46:27.000436Z

but I’m also pushing for a ‘clojure all the way down’ approach

eoliphant 2018-05-16T14:48:49.000062Z

Clojurescript in the browser, this backend stuff we’re discussing, EDN/Transit all over, but basicially where throughout the system, :application/id means the same thing, can be validated the same way, etc. Then some automated to the extent possible translators for typical REST/GraphQL for clients who aren’t fortunate enough to be using this cool stuff lol

eoliphant 2018-05-16T14:49:17.000415Z

yeah I’ve been down this road a few times myself lol

eoliphant 2018-05-16T14:49:23.000612Z

I used Axon on a couple projects

eoliphant 2018-05-16T14:49:36.000419Z

as well as Lightbend’s Lagom

eoliphant 2018-05-16T14:50:00.000588Z

both have a lot of nice batteries included stuff

eoliphant 2018-05-16T14:50:11.000565Z

but they bring simple vs easy to mind lol

2018-05-16T14:51:07.000781Z

yes, so now you need to make a lot of choices yourself

👍 1
dbernal 2018-05-16T15:14:25.000153Z

@lmergen I'm using the SQL plugin to get some initial values and then a downstream task uses those to call out to SQL. I'm thinking now that the downstream task is actually the issue here. I'm still struggling to get it to consistently call out to SQL from within a function task. I'm not sure how the SQL plugin is able to do it so consistently with the PooledDataSource; but for me, even with an input sequence it's not able to get consistent results back from a SQL call