@quoll I’m curious if you have any high-level thoughts on the differences in approach between Asami, Datomic, and Crux in terms of their choices regarding documents (records) vs. triples? Crux opts for documents for ease of use and partly for easier support of bitemporality (I believe) where Datomic’s triples and schema tend to trip people up but might provide for richer datasets later on. I have the feeling that documents are largely the right way to go for most services that would use one of these databases, but that’s largely still just a feeling. It feels like triple-stores haven’t made it far beyond the semantic web / let’s-document-the-whole-museum/library kind of domains but I was also wondering if you knew of counter-examples in the wild?
UK Government data. We have quite a lot of examples of environmental data and government statistics as RDF. (Disclaimer the company I work for build and/or support these…) But as a small sample (there are more): Scotlands official statistics : https://statistics.gov.scot/home Loads of data for the UKs Environment Agency and DEFRA (Dept for Environment Farming and Rural Affairs). Covering pollution incidents, river catchments, water quality, fish ecology, bathing waters, assets and the metadata on them (by which they mean things like river banks, resevoir dams etc) https://environment.data.gov.uk/appgallery Ministry of housing communities and local government data: https://opendatacommunities.org/home Geographical reference data for the ONS: http://statistics.data.gov.uk/ Many more too
@rickmoynihan Thanks! I think this is definitely a step out of "museum" territory, though all of these examples are still pretty firmly planted in the "sustained" half of "sustained vs. ephemeral" where the adjective describes the data set/model, not an individual datum. I assume some businesses have tried to build data models that might only last a few years on top of RDF and/or triple-stores, especially as Datomic gets more mature. We invested a lot of time and money into Datomic for a big e-commerce experimentation platform I worked on 6 years ago... and it didn't pan out. That was a performance ceiling and it's quite possible it would work in that environment now, but it took us ages just to find out about the performance problems because modelling took so long.
not sure what you mean by the “sustained vs ephemeral data-set/model / individual-datum” comment
@steven427 when you say “performance ceiling” were you talking about the throughput for Datomic, or are you talking about from a modeling perspective (or something else again)? I’m just asking because Datomic is good at some use cases and not at others where different graph DBs can perform better. But if it was to do with the architecture and data modeling then I can understand hitting walls there too.
@rickmoynihan Museum, library, and government stats data isn't going anywhere. The "model" doesn't change from one year to the next (or it changes only incrementally). This isn't usually the case for educational institutions, though it depends what a person is building. It definitely isn't the case for business units — whether startups or departments of an older corporation. To go back to the Experimentation Platform example, e-commerce moves quickly enough (and haphazardly enough) that Datomic's schema requirements were quite a pain in the butt. Datomic might be fast enough now to build an EP with it (I'm not sure) but modelling the domain would still be painful. If I think back to nilenso's client list from the time I worked there, I imagine Datomic would be just as painful for (secondary) education software, entertainment, publishing, transportation, and new forms of healthcare (http://simple.org). Whether business or non-profit, the domain model often moves quickly (sometimes it's downright disposable) because it's uncertain what the software team is building from the outset. I've heard the opinion that some triple-stores might actually provide better flexibility for that sort of work (assuming a hard schema isn't required), but I think @quoll’s work at Cisco (tools for threat/breach analysis) is the only example I've ever heard of.
What you’re describing is definitely a frustration with Datomic (a huge frustration for me), but it’s managed quite well by RDF stores
Asami is really just an RDF store with more flexibility in the roles (i.e. any kind of data in any position), and better integration for Clojure 🙂
> when you say “performance ceiling” were you talking about the throughput for Datomic, or @quoll Sorry, yes — the "performance ceiling" was throughput. It was two separate issues: Building the software was slow (programmer time) due to modelling mismatches. Executing the software was slow (runtime) in large part because Datomic was only one year old. The first informed the second, though: Because it took us a long time to build the system, it took a long time to load test it and we found out much too late that Datomic would never be fast enough. This is possibly a bad example because the EP we were building wanted Datomic for temporality and immutability, not triples/graph stuff. The system was rebuilt on Postgres, first-class-temporality went out the window, the system was fast enough, and the data science team dealt with temporality later (in time-series / analytics databases). That said, this is my only production experience with a triple-store.
> any kind of data in any position Yeah, this sort of flexibility I'm quite curious about. Are there folks out there consuming Asami who do work on shorter-life-cycle data models? Or perhaps other RDF stores that people use in this way?
Yes, Datomic is aimed at a particular kind of use case, and it does very well there. Outside of that, it’s the proverbial hammer (everything looks like a nail). You can certainly use it in a lot of other use cases, but it doesn’t necessarily do well. Many other systems are better in that regard
I honestly don’t know. I haven’t been getting much feedback from people using Asami. I honestly don’t know if anyone does! However, my group at Cisco does, so I’m focusing on their needs. They are certainly working with short-life-cycle data, though the models are static
As for other stores… I don’t know. Most RDF applications I’m aware of tend to focus on large datasets and querying with few updates
Having not taken a serious look at Datomic since 2014, I suppose I'm not even sure what use case it does particularly well with. Back then, it was sort of marketed as "clojure data structure immutability... but on disk". It feels like they've backed away from that message now but I'm not 100% sure where Datomic fits, then.
It focuses on entity-level data access, both for retrieval and for updates. It does not handle large ingests particularly well, but instead focuses on small transactions which update entities.
> I haven’t been getting much feedback from people using Asami. Building useful open source means you'll never know 99% of your users, which I suppose is a very concrete sort of Selfless Service. It's kind of beautiful, in a way... but not very helpful for case studies. 😉
@quoll That sounds like more or less the same story as 2014 so... it's consistent, at least.
The best feedback I’ve ever had was in having a chat with someone at a previous job and my project came up. I mentioned that I was concerned that it wasn’t meeting his group’s needs because I never heard from him. His response was, “That’s because it just works. We ask it to do something and it gives us the right response every time.”
I’ve never felt more proud of my work than I did in that moment 🙂
😄 I'm not sure I've ever reached that level of success, but I've definitely set that bar for myself. I always describe it to coworkers as "becoming a utility" — you only think about the water or electricity utility when it fails you. Good software is the software people take for granted.
> That sounds like more or less the same story as 2014 so... it’s consistent, at least. This was what I picked up back in that timeframe as well, but it was reiterated by Marshall Thompson when he was interviewed on the https://www.cognitect.com/cognicast/156 last month
I was looking for the timestamp of when he said it, but I didn’t find it. It was later in the interview, but not at the end. I don’t want to listen to the whole thing again. I’m at work! 🙂
@steven427 Thanks for explaining what you meant by sustained vs ephemeral. I think that’s a fair characterisation of most of the data we work with; but I’m not sure I buy the argument about triplestores/RDF being somehow unsuited to ephemeral data. I think the bias towards those applications exists for the reason that RDF targets modelling as a world wide global problem… i.e. the community encourage people to share data by defining and targeting ontologies. Also the technologies thinking about and supporting the implications of allowing anyone to say anything about data. Hence applications and developers in that space tend to bias accretive changes for sure, and stability in old terms. I think this is mainly because the RDF community is primarily targeted at the data integration problem. And to do that properly you don’t want to change things radically, as consumers, often consumers you’re unaware of will be broken. However if you picked RDF as a backend for a closed world application there’s little about it that fundamentally makes it unsuited to ephemeral “update in place” systems… Though there may be a bias towards the ecosystem not trying to support those sorts of applications through additional tooling etc. I could certainly think of a few things that are a little more awkward without sufficient API/library support for ephemeral apps; e.g. update in place can be a little awkward, depending on how you do it, because triples are idempotent values not the entities… Though there are patterns / libraries for handling this kind of stuff.
@rickmoynihan That all makes a lot of sense when you describe it that way. Thanks!
That’s also not to say that there aren’t better technology choices if you’re building an arbitrary “ephemeral closed world app”, there almost certainly are, depending on your requirements; but I suspect they’re not due to fundamental deficiencies in RDF itself related to ephemerality. Some other reasons to choose RDF (even for a closed ephemeral app) would include: 1. It’s really the only game in town for standards based graph databases (multiple implementations, open and commercial) 2. leveraging existing modelling work and documentation in 3rd party ontologies, or particular technologies already built on RDF. 3. stability due to standards (though not necessarily maturity of implementations) 4. a feature of a particular RDF triplestore implementation That said these requirements usually aren’t going to be primary requirements, so most people will find something more suited to their needs elsewhere.
Yes... NASA. Stardog has had a lot of work with them
If you have a need to work with documents, then storing docs and converting/indexing it all into triples seems like a good approach. But if you want to traverse the graph, or just query docs (as opposed to retrieving them) then the more semweb approach is more naturally aligned, IMO.
There’s no definitive answer to this. It depends on what your data looks like and how you want to interact with it
Still a fan of SemWeb, personally 🙂
hm. > But if you want to traverse the graph …. then the more semweb approach is more naturally aligned I’d love to hear an explanation of why you feel this way. Is it a factor of implementation (for the db implementers, I mean)? Or does it feel more natural to you as a db user, somehow? Or both?
A combination of the data being broken into triples for storage, and SPARQL having syntax to enable traversal
Yeah also another way to look at it might be that a document oriented view makes entities first class — where as in RDF/triples properties are really more the first class thing.
Which is analagous to in idiomatic clojure (and spec) targeting map keys individually rather than the composites that hold the keys. i.e. the design of s/keys
Yeah. That analog makes a lot of sense, actually.
Interesting that Stardog’s GitHub has a hard fork of Rocks with a little bit of constant activity in it.