Hi, I have a kind of general architecture, etc question. I’m looking to use onyx for a commander’ish pattern implementation. One of the things I’m working through is the best way to manage/maintain the state that a command processor needs in order to do its thing and issue the appropriate event(s). I’ve done stuff with ES/CQRS frameworks like axon, that have things like an explicit ‘event sourcing repository’, such that my say order command processor, would ask the repo for order 27, and it would return the state as a function of all it’s stored events. I’d been considering using datomic as this ‘aggregate repo’ or whatever, but I initially was having some heartburn as it could possibly violate the principle that the events are the source of record for everything, but now I’m thinking that as long as the datomic state is a function of applying events, can be rebuilt as needed, then it actually is ok, and actually potentially makes for a better ‘repository’ implementation,as it’s essentially an ongoing snapshot, as opposed to other libs/approaches where you maintain the snapshot, and still read N events to get to the current state. Sorry for the ramble lol, but just wanted to see what you folks thought about this
@eoliphant Been doing a little thinking about this myself. I am currently implementing this with Kafka streams since I don't have onyx available to me in this case. If using onyx and datomic you probably can use it as the state store and have an onyx job read the datomic log for the events
yeah @camechis I’ve been looking at Kafka streams also, and trying to decide how that might fit in, pros/cons, etc. It’s the Tyranny of Good Choices lol
And yeah, pulling stuff from the datomic log is yet another dilemma lol. Because, in that case strictly speaking, the datomic log(s) are the SOR and the event stream/store is derived, so it’s not event ‘sourcing’ per se. I know the NuBank guys did that (microservice datomic log -> kafka) but I saw a talk by their CTO recently in which he indicated that if he had to do it again, he’d have flipped it around
i think datomic could be fine for this, but i don't think it's that good of an event store
it's more of an aggregate store than an event store imho
yeah @lmergen, that’s my take also
for smaller projects I’ve just done reified transactions and tagged them
but this is abigger deal
so i need to break stuff out
@eoinhurrell have you seen the latest project by the onyx guys ? http://pyrostore.io/
yeah
looks pretty cool
it's perfect as an event store
but it depends a bit upon your use case / requirements
you could also just stream things to S3
so then you have s3 next to datomic
at least then you can always easily go back to the raw data
yep, especially with athena, etc
again, too many choices lol
yes
so whenever i face too many choices, i usually opt to keep things really simple
which would be s3 in this case
yeah that’s what I’m trying to get to
for me the main decision point
is what’s authoritative
event store is always authorative
and I’m trying to make that the events
yeah, but in some of these scenarios like datomic log -> event store
it doesn’t well ‘feel right’ lol, as strictly speaking it would be datomic
i would think that's overcomplicating things
yeah exactly
i would do kafka -> s3 and in parallel, kafka -> datomic
s3 for events
datomic for aggregates
you could even use kafka as the event store
yeah I’m going to take another look at s3
yeah
that’s what I’d been planning to do
but it's a terrible event store in my experience
just store forever in kafka
again, it depends upon what you want to do with it
if you only want to use it as backup, it's fine
so what issues have you had with it from the event storage perspective? I’ve seen some rumblings along these lines lol
if you want to allow your data scientists to query the event store directly, it sucks
if you use kafka as the event store, imho it's not a great tool for ad-hoc querying and data exploration
well right, but I thought typically, those guys would build their own views etc
ah but I see what you’re saying
if you put it on s3, you get a ton of extra tools like athena for free
querying ‘into’ the store itself
yes
as opposed to just in order reading into somethign more suitable
also, more tools integrate with s3 than kafka
yeah interestingly
this opens up some other possibilities
i’d been trying to push as much mgmt overhead to aws
so i’d been looking at kinesis etc
i've used kinesis firehose for years, it's solid
but the fact that there’s no ‘store’
was pushing me back to kafka
kinesis firehose can easily stream everything to s3 as well
yeah
so then s3 becomes your store, again
yeah and now i’m having some ideas, which is always dangerous lol
as long as the ideas are good... 🙂
lol
because I can get my in order semantics i guess out of athena
and again, datomic would actually make for an awesome aggregate store
you can make order semantics explicit
none of that (look at the snapshot, then grab the last few events stuff)
did you see this ? https://yuppiechef.github.io/cqrs-server/
the guy never actually implemented it
yeah I’ve played with it acutally
the stuff that was there lol
as in, never ran in production
and to your point
we’ve got some more options now
🙂
so to your point kinesis/kafka could give us the required serialization/ordering
so that stuff shows up in s3 correctly
ah that’s another thing
who decides what the correct ordering is ?
are you in your case just using the ‘put time’ for order?
when you have multiple kafka partitions / brokers
how would you manage ordering ?
right that’s anoher thing i was working through lol
that makes it less than suitable in some scenarios
since this is business stuff as opposed to just streams of data from IOT or somehting
so what i do is not depend upon ordering
i use the onyx epoch id
i tag that
so i can deduplicate
ok yeah
because I have to have some notiion of it
but, for explicit ordering of multiple commands / retries that would conflict
because i’ve got financial transactions, etc going on
i came to the conclusion that the only reliable way to deal with it is eventual consistency
yeah and de-duping/idempotency have to be in the mix as well
if your aggregate processors detect a conflicting operation (e.g. deleting the same user twice), they do conflict resolution at that point
right
well that’s where datomic could be super useful
imho the cqrs / event sourcing pattern demands those kind of conflict resolutions. you cannot achieve strong consistency like a rdbms anymore.
yes that is true
tag the transaction
yes, but that's actually similar to onyx' epoch
right
right so it could be done there as well
i would explore it, because if it's possible, you give yourself more freedom in choice of database
ok
yeah i’m going to give that a whirl
anyway this is all my opinionated advice, take it with a grain of salt 🙂
Yeah I’m dealing with 97 things on this project lol
nah man this is super helpful
so I’m getting this worked out
i learned one thing: one does not simply implement cqrs
but I’m also pushing for a ‘clojure all the way down’ approach
Clojurescript in the browser, this backend stuff we’re discussing, EDN/Transit all over, but basicially where throughout the system, :application/id means the same thing, can be validated the same way, etc. Then some automated to the extent possible translators for typical REST/GraphQL for clients who aren’t fortunate enough to be using this cool stuff lol
yeah I’ve been down this road a few times myself lol
I used Axon on a couple projects
as well as Lightbend’s Lagom
both have a lot of nice batteries included stuff
but they bring simple vs easy to mind lol
yes, so now you need to make a lot of choices yourself
@lmergen I'm using the SQL plugin to get some initial values and then a downstream task uses those to call out to SQL. I'm thinking now that the downstream task is actually the issue here. I'm still struggling to get it to consistently call out to SQL from within a function task. I'm not sure how the SQL plugin is able to do it so consistently with the PooledDataSource; but for me, even with an input sequence it's not able to get consistent results back from a SQL call