We've released Datahike with patch version 0.3.1
today containing some bugfixes and updates. The changes can be found https://github.com/replikativ/datahike/releases/tag/v0.3.1. Enjoy. 🙂
Hi! 👋 I'm trying to use datahike with AWS Lambdas and DynamoDB. First of all I know I'll have to write a datahike-dynamodb
. Keeping in mind that several lambdas can be executed concurrently the problem is finding a way to linearize read/writes to avoid concurrency problems. I appreciate any observations like 'this is impossible/too complex because of this', 'you could try synchronization this way'.
how to use retract?
@franquito who are the players accessing your drawers ? .. and how do clojure atoms not prevent the problem outright?
at atom can only be atomically modified, so nobody can access it mid transaction
any persistent storage requires the use of atoms in clojureland... forgive me if this is common knowledge and you're asking a different question.
probably but are you talking normal coding convention or technicalities
I would go so far as to say that what you're suggesting isn't even necessarily a "normal coding convention" in Clojure. Just as an example, Datomic databases don't use atoms in any part of the interface. It's cool/interesting that DataScript used simple atoms for state but that doesn't make this a typical pattern. More generally though, Clojurists read and write data files all the time, and there's almost never a reason to use atoms in the process of that (unless the point of the process is to stick in an atom for other reasons).
Moreover, you don't want non-idempotent (such as file writing) operations to take place in functions passed to swap!
, since they can get executed more than once if multiple calls to swap!
are happening concurrently for a given atom.
Again, I may have simply misunderstood what you were trying to say here, so please let me know if I misunderstood your intentions.
You are saying several different instances will have a live in-memory db that synchronizes with a central server? This is possible if client-side data is not the canonical. For example, slack as a chatroom if you had it open for 6 months would accumulate many long logs in this room, but when you refresh the page it gets a fixed number. in a similar way, your connected lambas will have their own state, but provided it is periodically refreshed by the server, and their egregious mistakes are local and not codified to the central authority, it is not problematic. it is only problematic when two lambas try to write to the same drawer, and if you have multiple in-memory dbs actively running, you must have a merging strategy or just force a merge, but ideally there will be no overlapping keys between them, and the database will ensure that two people cannot modify the same point. Provided they are all accessing the same database, there is no danger in the concurrency glitches you may foresee. A lot of thoughtful pre-thought or pre-engineering (like with atoms and append-only data, nigh-immutable types) helps eliminate problems down the line. So yeah there's a treatise for you, was I close?
We could talk about the design a little bit if you like @franquito .
Hi! Thanks, if It's ok with you I'll ping you once I have an idea about how all the tools work together (I'd like to check how hitchhicker trees, konserve and datahike play together). Although right now I'm curious to know about how datahike-server plans to solve the concurrency problems that datomic transaction functions solve.
Lets first discuss a simpler example. If you have a web server with in-memory datahike there's a concurrency problem that appears if you, for example, try to increase a counter atomically (Imagine several HTTP requests that try that at the same time). In Datomic I could use transaction functions to solve this problem. Instead in datahike I could use locking
from clojure.core
to avoid collisions.
Now imagine each HTTP request runs in its own environment (AWS Lambda) and datahike uses DynamoDB as the storage. In this scenario I can't use locking
because the processes run completely isolated from each other.
@konrad.kuehne Is currently working on https://github.com/replikativ/datahike-server/ and connection management. You will always have to serialize in a setting like this and the easiest is to do it through one process in one place, i.e. the transactor.
@franquito That would be very cool! There is some prior work on implementing https://github.com/csm/konserve-ddb-s3 https://github.com/alekcz/konserve-faraday and https://github.com/replikativ/datahike/pull/89, that you are probably aware of.
@alekcz360 is interested in AWS support as well.
The reads against a snapshot are automatically consistent, the only thing that needs to happen is to make sure that the writes are all funneled through one process. I have unfortunately not used Lambdas yet, because I am mostly working in an academic setting at the moment, but if you describe the details a bit more we can work through it together.
Yes, or, if you do care more about storage cost than throughput, the dynamodb + s3 combination in https://github.com/csm/konserve-ddb-s3.
We have implemented most of the features that have been built into this codebase in the underlying libraries now, so to get production ready a few things need to be factored. But I think it should still work as it is. I have never used it though.
Hi Whilo! AWS Lambdas are one time processes that you can use to (with some limitations) do virtually anything. I'm using them to act as an HTTP requests processor (Because is really cheap 😅). The problem is you loose some usually common functionalities of web servers. For example, you can't implement in-memory session stores because each Lambda runs isolated from each other.
I didn't know about most of the resources you just sent! Thank you! I'm starting to read about datahike related libraries. Looks like konserve-faraday
is what I need to hook datahike to DynamoDB, is this correct?