datahike

https://datahike.io/, Join the conversation at https://discord.com/invite/kEBzMvb, history for this channel is available at https://clojurians.zulipchat.com/#narrow/stream/180378-slack-archive/topic/datahike
kkuehne 2020-06-24T08:30:36.417200Z

We've released Datahike with patch version 0.3.1 today containing some bugfixes and updates. The changes can be found https://github.com/replikativ/datahike/releases/tag/v0.3.1. Enjoy. 🙂

3112🎉
frankitox 2020-06-24T18:21:32.419600Z

Hi! 👋 I'm trying to use datahike with AWS Lambdas and DynamoDB. First of all I know I'll have to write a datahike-dynamodb. Keeping in mind that several lambdas can be executed concurrently the problem is finding a way to linearize read/writes to avoid concurrency problems. I appreciate any observations like 'this is impossible/too complex because of this', 'you could try synchronization this way'.

sova-soars-the-sora 2020-06-24T21:46:45.420200Z

how to use retract?

sova-soars-the-sora 2020-06-24T21:47:52.421Z

@franquito who are the players accessing your drawers ? .. and how do clojure atoms not prevent the problem outright?

sova-soars-the-sora 2020-06-24T21:48:13.421300Z

at atom can only be atomically modified, so nobody can access it mid transaction

sova-soars-the-sora 2020-06-24T21:48:48.422100Z

any persistent storage requires the use of atoms in clojureland... forgive me if this is common knowledge and you're asking a different question.

sova-soars-the-sora 2020-06-25T14:42:47.432500Z

probably but are you talking normal coding convention or technicalities

2020-06-25T18:40:51.436Z

I would go so far as to say that what you're suggesting isn't even necessarily a "normal coding convention" in Clojure. Just as an example, Datomic databases don't use atoms in any part of the interface. It's cool/interesting that DataScript used simple atoms for state but that doesn't make this a typical pattern. More generally though, Clojurists read and write data files all the time, and there's almost never a reason to use atoms in the process of that (unless the point of the process is to stick in an atom for other reasons).

2020-06-25T18:44:31.436200Z

Moreover, you don't want non-idempotent (such as file writing) operations to take place in functions passed to swap!, since they can get executed more than once if multiple calls to swap! are happening concurrently for a given atom.

2020-06-25T18:45:08.436400Z

Again, I may have simply misunderstood what you were trying to say here, so please let me know if I misunderstood your intentions.

sova-soars-the-sora 2020-06-24T21:52:56.426Z

You are saying several different instances will have a live in-memory db that synchronizes with a central server? This is possible if client-side data is not the canonical. For example, slack as a chatroom if you had it open for 6 months would accumulate many long logs in this room, but when you refresh the page it gets a fixed number. in a similar way, your connected lambas will have their own state, but provided it is periodically refreshed by the server, and their egregious mistakes are local and not codified to the central authority, it is not problematic. it is only problematic when two lambas try to write to the same drawer, and if you have multiple in-memory dbs actively running, you must have a merging strategy or just force a merge, but ideally there will be no overlapping keys between them, and the database will ensure that two people cannot modify the same point. Provided they are all accessing the same database, there is no danger in the concurrency glitches you may foresee. A lot of thoughtful pre-thought or pre-engineering (like with atoms and append-only data, nigh-immutable types) helps eliminate problems down the line. So yeah there's a treatise for you, was I close?

kkuehne 2020-06-25T13:59:31.431800Z

We could talk about the design a little bit if you like @franquito .

frankitox 2020-06-25T14:08:04.432Z

Hi! Thanks, if It's ok with you I'll ping you once I have an idea about how all the tools work together (I'd like to check how hitchhicker trees, konserve and datahike play together). Although right now I'm curious to know about how datahike-server plans to solve the concurrency problems that datomic transaction functions solve.

frankitox 2020-06-24T22:32:14.426500Z

Lets first discuss a simpler example. If you have a web server with in-memory datahike there's a concurrency problem that appears if you, for example, try to increase a counter atomically (Imagine several HTTP requests that try that at the same time). In Datomic I could use transaction functions to solve this problem. Instead in datahike I could use locking from clojure.core to avoid collisions.

frankitox 2020-06-24T22:35:51.426800Z

Now imagine each HTTP request runs in its own environment (AWS Lambda) and datahike uses DynamoDB as the storage. In this scenario I can't use locking because the processes run completely isolated from each other.

whilo 2020-06-24T22:59:46.427100Z

@konrad.kuehne Is currently working on https://github.com/replikativ/datahike-server/ and connection management. You will always have to serialize in a setting like this and the easiest is to do it through one process in one place, i.e. the transactor.

whilo 2020-06-24T23:03:12.428Z

@franquito That would be very cool! There is some prior work on implementing https://github.com/csm/konserve-ddb-s3 https://github.com/alekcz/konserve-faraday and https://github.com/replikativ/datahike/pull/89, that you are probably aware of.

whilo 2020-06-24T23:03:25.428300Z

@alekcz360 is interested in AWS support as well.

whilo 2020-06-24T23:05:14.429800Z

The reads against a snapshot are automatically consistent, the only thing that needs to happen is to make sure that the writes are all funneled through one process. I have unfortunately not used Lambdas yet, because I am mostly working in an academic setting at the moment, but if you describe the details a bit more we can work through it together.

whilo 2020-06-25T23:50:30.436700Z

Yes, or, if you do care more about storage cost than throughput, the dynamodb + s3 combination in https://github.com/csm/konserve-ddb-s3.

whilo 2020-06-25T23:51:34.437Z

We have implemented most of the features that have been built into this codebase in the underlying libraries now, so to get production ready a few things need to be factored. But I think it should still work as it is. I have never used it though.

frankitox 2020-06-24T23:22:05.429900Z

Hi Whilo! AWS Lambdas are one time processes that you can use to (with some limitations) do virtually anything. I'm using them to act as an HTTP requests processor (Because is really cheap 😅). The problem is you loose some usually common functionalities of web servers. For example, you can't implement in-memory session stores because each Lambda runs isolated from each other.

frankitox 2020-06-24T23:24:05.430100Z

I didn't know about most of the resources you just sent! Thank you! I'm starting to read about datahike related libraries. Looks like konserve-faraday is what I need to hook datahike to DynamoDB, is this correct?