Pretty much every action we take is recorded in a database table -- as an append-only log -- in addition to whatever transactional data we record (in the business sense). That includes actions we take on behalf of our members (customers) as well. That means that if we perform cleanup on transactional data, because some part of the process failed -- i.e., we "manually" roll back the changes -- we can still see in the logging tables a) what actions we successfully took up to that point and b) what actions failed (because we log all failures).
Some people just stream their logs to some other system, or a non-traditional database, but we've found value in having them all together so we can more easily run reports that contain both transactional and non-transactional data. The "downside" is that we have several tables with hundreds of millions of rows (but MySQL has been surprisingly good, even at that scale).
It also makes it easier to design and run batch processes that can identify discrepancies and reconcile them after the fact if needed. We have a number of "data migration" processes we've built over the years that have allowed us to correct transactional data that reflected bugs in our systems, even if the bug wasn't discovered for a couple of years, because we often also have the raw action/event logs, and can then patch the system forward from that point. An ad hoc event streaming system, in effect.
If I understand correctly you would have “business tables” (cart, product etc) and “log tables”. For every transaction to any of the business tables, you have a corresponding log to the “log tables”? Then, if something fails during a transaction, you manually rollback (thanks to try/catch) but know what you were originally trying to do because they are recorded in the “log tables”. yes? Would you also log to the log tables when performing a manual cleanup?
> It also makes it easier to design and run batch processes that can identify discrepancies and reconcile them after the fact if needed Yes, this makes sense and was one of the Q’s I had left over. With the logs though, it would be easier to fix the issues, I imagine.
Right, yes, we log actions we are taking and we log failures of those actions (and successes if we need the result of those actions). The "manual" recovery in the code usually has the data on hand to know what to do in terms of updating the transactional data but we don't rollback the log table data -- we may well log the failure that occurred, but we don't necessarily need to query the log table to figure out how to recover.
So we update
transactional data but we do not update log data, we just write additional log data saying what update action we took. Does that help?
So a transaction
table may get a row describing a pending transaction and a logtransaction
table may get a row describing the transaction we intend to perform. Then the payment settles and we update transaction
to settled
or whatever and logtransaction
gets a new row indicating the settlement information.
(reduce transactional-change initial-state event-stream)
🙂
(our system isn't quite like that but that's the parallel I would draw -- we're not a true event sourcing setup, because we got started with this a long time ago: the v1 was built in 2001, v2 broke ground in 2009, and I'd say we're most of the way into v3 at this point, having migrated and transformed all that data from v1 to v2 to v3 -- the joy of long-lived systems)
Yes, I believe that’s enough to go on 🙏