The plan at this point is to implement the block abstraction over various types of storage. Mapped block files are specifically for a single-machine setup on the JVM. IndexdDB are how we’re doing a single machine setup on Javascript.
My hope is that if we implement blocks on a system that does distribution, then scaling out gets managed for us.
@quoll Have you also looked at the https://github.com/replikativ/konserve stuff?
I'm not deeply familiar with it, just wondered, as this is used by datahike
Nope 🙂 But I can
The design was informed by changes I was trying to make in Mulgara, along with some thinking about what Datomic does with provisioning
One issue is that querying has been written synchronously. That’s making working with things like IndexedDB awkward. I see that konserve is doing everything asynchronously
Yes datahike is full async by default.
If i recall correctly they have flags to make the runtime sync/async (at hh-tree level at least) , then use macros to do error handling & co since they cannot rely on just one way of doing things
Add cljs support into the mix and that can get quite confusing (but necessary)
Doing io with some async facade is a good approach in this context tho
Hey @quoll, @borkdude and @mpenet. This is a good point to introduce myself. I am Christian, one of the architects of Datahike. We, the team at lambdaforge, have had a look into your storage layout whitepaper lately and your work looks very good. We have some questions about your design choices and would love to compare and combine your AVL ideas with the hitchhiker-tree concepts to mutually improve our storage layers. Regarding konserve, we have designed it exactly to be a minimum viable portable abstraction for asynchronous storage. Since you map your AVL tree with binary layouts directly, konserve's edn serialization facilities might not be needed, but other than that it should be usable and is a potential point of collaboration. As @mpenet has pointed out, to port our stack we have introduced a restricted core.async based monadic async-await alike asynchronous DSL that we can compile away for the JVM because there synchronous IO turns out to be much faster. Based on this abstraction @grounded_sage has a running prototype of Datahike in cljs since yesterday and we are very interested in your thoughts and work on asynchronous programming in cljs.
Since we seem to have very similar long-term goals in terms of reach and functionality and our core objective is not the promotion of the current implementation details of Datahike, we are in fact open to collaborate on any level you see fit including a potential joint project and shared funding.
I haven’t been on the CLJS work recently. That’s being worked on by @noprompt. I’m trying to wrap up the JVM code so that I can join him
Well, in my case, a lot of it grew out of frustration with the Mulgara codebase (since it is all Java), and wanting to make some changes to the design. I have a strong bias to basing this work on the Mulgara architecture, since that implementation has a record of being very fast. In particular, the AVL trees turned out to be a very effective choice
I saw David Greenberg’s talk on hitchhiker trees at Strangeloop, and I’ve wanted to try them out. I haven’t yet though
That makes sense. @mpenet has helped to improve the performance of the hitchhiker-tree quite a bit btw.
Right now, my opportunity to work on these has been fortuitous for me. I was doing it in my evenings and weekends. But then I mentioned it at work, and my manager, and eventually HIS manager thought it sounded like a great idea, and told me to work on it
Since we already also have our implementation we can compare it with the AVL tree.
Well, the AVL trees are implemented here: https://github.com/threatgrid/asami/blob/storage/src/asami/durable/tree.cljc
Does this mean you are constraint with what Asami can become as a project by your management?
It stores the node balance in the top two bits of the “left” pointer. I’m thinking I should probably change this to the topmost bit of both the left and the right. That’s because these numbers will never be negative, so there’s no issue with using those bits, and it makes it easier to switch between 64 bits and 32 bits if we want to
Sort of. It’s a weird situation 🙂
The story is that I was building a rules engine, and showed my manager. He loved it and told me to work on it during working hours. I said that it was Open Source, and he was happy to do that. That became Naga. He also said that he wasn’t interested in using a commercial database (I was planning on Datomic as my first back-end). “Can you build your own?” So I built a minimal in-memory store. Then he asked for more features. One of them was to port to ClojureScript. Then more features. And more. Eventually, I decided to pull it out and turn the storage into its own project. That is Asami. Other members of the team use Asami, and I try to be as responsive as possible to them. Working with them it became clear that they were more familiar with the Datomic API, and so I wrapped the Graph/query API in a new namespace that started to present something similar to what Datomic looks like. Most of this is being driven by what I want to do next. But my primary focus is always to make it useful for my team.
Ok, that sounds reasonable to me.
Right now, the primary focus is durability. As soon as I can get a release out for that, I’ll be moving back to the public API, for functions like with
, and also to clear the bug/feature backlog
Would you be interested then in comparing the durability bits and see whether we can help each other there?
sure
Which format of discussion would be most appropriate in your opinion? We are currently doing a lot of shared programming/discussion sessions and we could do one of those together, for example.
It will depend on timing 🙂 I’m on the East coast of the USA (near Washington DC) (UTC-5)
West Coast PST
I am in Vancouver, BC, (PST) and the rest of the team is in Europe (CET). So mornings in PST work well at the moment, e.g. 8 or 9 am.
@noprompt Would this work for you?
Unfortunately, no. I have 3 children I’m responsible for at those times. 🙂
However, I am content to communicate asynchronously here and elsewhere. Paula and I are also on the same team.
Hi. I am Chrislain and also a member of the lambdaforge team. I’ll be happy to join the call.
Will konserve stick to a hard dependency on core.async
?
@noprompt, @alekcz360 Besides removing the core.async dependency we might need to add additional protocols to facilitate low-level block based access to konserve.
Removing core.async shouldn't be an issue. The code is structures well enough for it to be done painlessly. I think it'd be worthwhile. Though providing an optional namespace to provide that kind of interface for current users.
Yes, we should discuss this together, I think.
I’m happy to join the call as well.
@noprompt as far as I am aware we are open to async alternatives. Our main focus has been keeping the codebase cross platform with as little divergences as possible and good error handling. But I would defer to @whilo for a more detailed answer to your question.
The interface can be made callbacks (which is general) easily (also with core.async). If you do not want to the internals to use core.async then we would need to rewrite everything in a CPS/callback style. Which programming model would you prefer?
Personally, I prefer CPS/callback style and in particular the pattern of using promise style [resolve reject]
as it is trivial to implement combinators, etc. while minimizing logic common when using “handlers” i.e.
(fn [error value] (if error ,,,))
I see. Have you had bad experiences with core.async?
Yes, however, those experiences are few.
My perspective is that of a consumer. It is occasionally not desirable to as a consumer to take on core.async as a dependency.
Fully agree!
Build the tooling in a low level way that optionally people can build core.async on top of that. Callbacks are fine for this.
Also, CPS/Callbacks merely rely on functions and assume little else which is very inclusive.
I made the "mistake" to couple babashka.pods async functions to core.async, but I changed my mind and switched to callbacks. I don't want to force core.async on consumers.
Core.async is quite heavy, you're pulling in tools.analyzer, etc. And who knows, a few years from now there's going to be another Clojure async thing. Callbacks will still be there.
Maybe project loom will bring interesting things in this regard
Loom is jvm only. I quite like core.async personally but for konserve callbacks might be good enough. Callbacks are ok as long as you're not doing a lot of composition with async values, when you are things become hairy fast imho.
That said konserve just needs some form of Promise, could be completablefuture on jvm and js/Promise on cljs otherwise. Or just callbacks
I quite like core.async btw, that's not the point
Speaking from experience, I was confronted with a decision earlier this year to base a workflow interpreter we use on top of promises but ultimately decided that functions accepting resolve
and reject
callbacks gave use the most cross platform flexibility because, again, just functions.
Yeah, that’s my feeling too @borkdude. It’s a great dependency for an app but not necessarily for a library unless, say, it was specifically meant augment/enhance.
@borkdude not sure what you mean, I think we're all saying essentially the same thing
Agreed :)
I think on hh-tree the discussion about this same issue was that we actually needed better ways to compose async values, but that's not needed at konserve level, hh-tree/datahike could turn whatever konserve returns into what it wants.