architecture

marciol 2020-04-25T16:43:01.189100Z

I'll repeat this question here given that it touchs the architecture aspect. Hi all. I'm thinking in increase our usage of Datomic, but I have some doubts about patterns of usage in a distributed microservices setting. It's common to see in the wild Datomic as the souce of truth and the final place where all our data should live. There are a set of good practices related to persistence layer with the microservices approach, and one of them is to set a database per bounded context to avoid coupling, but seems that doesn't apply when using Datomic, given that Datomic allows distributed peers. Can anyone shed more light on this subject. Blog posts and articles are very welcome.

👍 3
marciol 2020-04-28T19:12:15.191600Z

Hey @vemv I’m a little bit biased towards a self contained systems and it pretty awesome that Datomic allows this kind of use case. @jaihindhreddy maybe Datomic Cloud can help with this? In this setup our limit is only the throughput of the prefered transactor machine associated with the limite of DynamoDB.

apbleonard 2020-05-06T13:54:57.192600Z

Like everything it's a sliding scale of choices with trade offs at either end... At one end you have a monolith, with its well known long-term problems. Not far from that have separate services but a single shared (Datomic) database under them. You get all the power of (arguably) the best database out there with its cross querying and data modelling power, built in immutability etc etc. You will make great progress very quickly. But you also have all the bad things of the "Shared Database Pattern" - that is all the services start cross querying each others data and there's little separation of the models. Yes you can individually deploy services - but the data layer is joined up and if Datomic is no longer your company's bag, shifting away from Datomic will be very painful. At the other end you split all state into seperate tiny databases that cannot cross-query each other and use CQRS to aggregate read models, preferably with Kafka or a messaging system that guarantees non-duplicated in-order delivery (Amazon MQ / SMS FIFO) and not e.g. Rabbit that does not. That is really expensive and error prone once the realities of distributed systems kicks in - and encourages such a fine grained difference in data models that you are hurt by myriad translation layers and misunderstandings between the multiple teams that you need to build all this stuff - causing difficulties with Product Owners and BAs who have difficulty signing off API focused fine grained stories a world away from the end to end functionality they orginally wanted. (Disclaimer - this is where my company is right now so I'm biased against this particular option!) On the upside of this approach however all the services are separately deployable - they can be independently moved to the cloud for instance taking their database with them, which is very useful. My advice would be: • Start at the monolith+Datomic system end so you can focus on the data model that fits your business problems and make progress fast. I would involve BAs and POs in defining and/or understanding that data model by the way. • If two different parts of the business are using two different halves of your "system" with different concerns language etc - these are different DDD bounded contexts - so worth splitting into two "systems" (not "microservices") i.e. separate monoliths with their own databases - hopefully Datomic 🙂 - with their own teams, Product Owner etc. These should be hopefully narrowly integrated in some way using messaging, APIs, Kafka etc. • Split each individual "system" into microservices tactically - perhaps where a module has a narrow API hiding its own complexity and state and where cross querying of state models is not likely, but don't expect your BAs or POs to care. • If you entire system can be modularised into such modules the whole thing can be split up into separate microservices with their own DBs then you're golden 🙂 But ... is your state really that separated? Most average businesses introduce awkward cross module and indeed cross domain requirements way after design decision have been taken that make such separation really hard - the simplest report or even query that well-formed joined-up data at rest could easily handle becomes super hard in these environments, and once built in this way it hard to get back. As I mentioned above if you go too far along this sliding scale ... 🙂 ... your teams will grow and everything takes long time to build and get right even though deployment is relatively easy. Sorry for the long post. I hope I'm speaking to your question a little and not being too general (or reactive to my own woes!)

💯 2
marciol 2020-05-06T23:19:21.193Z

I really understood your point and I’m biased as well, as I’m feeling that complexity is unavoidable and need to goes somewhere. I started to see that in a microservice setting is very hard to get the big picture of our system, and how complex is to establish communication with multiple POs, Squads, etc. We are already at that point, but feels so natural for most people, maybe because I came from a time when was pretty common to deliver value using a monolith, and even as a distributed macroservice using a monorepo strategy with configurable builds. By the way I’ll take all your advices in consideration. Thanks.

jaihindhreddy 2020-04-25T19:16:09.189400Z

Nubank does this kind of a thing. They have lots of services, each with their own datomic DB, and Kafka acting as the mode of communication b/w the services. I suggest you check out this talk: https://www.youtube.com/watch?v=ct5aWqhHARs

👍 3
vemv 2020-04-25T19:38:59.189700Z

Single datomic instance can be nice. One can use ns-qualifiers to denote ownership. Note that in a few ways Datomic already is an event log / event-sourced system, so making it talk to kafka can be redundant As per usual, avoiding distributed programming can save a lot of early headaches. If things are well-designed, they can grow to distributed if/when/where needed

💯 4
jaihindhreddy 2020-04-25T19:50:23.189900Z

I completely agree. But beyond a certain scale, write throughput starts to become a limitation, warranting a distributed setup.