rdf

Steven Deobald 2020-10-14T07:12:10.137200Z

Reading this thread has left me with another naive, open-ended question: Are any of you familiar with how linguistics systems deal with the raw components of language? The best tool available for Pali (at the moment) is the DPR: https://www.digitalpalireader.online ...and although it's incredibly detailed, it just sort of brute-forces word compounds against a massive dictionary. That dictionary includes components used in compounds, but the component relationships and relationships to Sanskrit and Latin are just hard-coded within word definitions, as far as I can tell. Wiktionary has some very basic understanding of word components: https://en.wiktionary.org/wiki/-%E0%A4%A6%E0%A4%BE%E0%A4%B0 https://en.wiktionary.org/wiki/Category:Hindi_words_suffixed_with_-%E0%A4%A6%E0%A4%BE%E0%A4%B0 Their SPARQL endpoint seems to be down (or inaccessible from Kashmir) at the moment: http://wiktionary.dbpedia.org/sparql I don't think this granularity would ever apply to the data on http://pariyatti.org but it will ultimately be required for Pariyatti's sister project, https://www.tipitaka.org I see https://linguistics.okfn.org but I'm never sure about the significance of projects like this. Are there others I should be reading about?

2020-10-14T07:55:32.147400Z

Linguistics isn’t really my area of expertise, so please take my comments with a pinch of salt. The biggest model I know of in the knowledge representation (KR) of linguistics is wordnet. It’s a long standing project to essentially provide a machine readable thesaurus of natural language terms, and give some idea of their proximity to each other etc. My understanding is that it’s a great dataset, with bindings into many ecosystems and it’s still widely used, especially to add some knowledge of synonyms etc to search engines etc. Being KR these days it’s probably considered old hat, with ML language models taking centre stage; however I think there’s a lot of progress in hybrid approaches that combine KR and ML; so I don’t think it will go anywhere anytime soon. Like you I’d be somewhat sceptical of the long term viability and maintenance of the OKFN stuff. They have their fingers in a lot of pies, and I suspect like many people are forced to chase income streams. That’s not to say that they don’t do good work; they absolutely do.

2020-10-14T07:55:54.147600Z

https://wordnet.princeton.edu/

Steven Deobald 2020-10-14T10:51:22.148Z

Oh yeah, wordnet... I remember that.

simongray 2020-10-14T13:38:58.148200Z

I still find it weird and not every ergonomic that in a system where knowledge is otherwise defined using named relations, for some reason this particular information has to be hardcoded into strings. 😛 but thank you for the in-depth history lesson.

simongray 2020-10-14T13:46:40.148500Z

@quoll btw Paula, if I may ask, what is the end goal of Asami? the readme says it is inspired by RDF, but it doesn’t really mention RDF otehrwise. If I wanted to use it as a triplestore for an existing dataset I guess I would have to develop code for importing RDF files and other necessary functionality?

quoll 2020-10-14T13:47:28.149400Z

That’s right, you would. Though I have an old project that would get you some of the way there

quoll 2020-10-14T13:48:55.149600Z

Ummm… the end goal. I only have vague notions right now. I can tell you why I started and where it’s going 🙂

simongray 2020-10-14T13:49:06.149800Z

Please do 🙂

quoll 2020-10-14T13:50:14.150Z

It was written for Naga. Naga was designed to be an agnostic rule engine for graph databases. Implement a protocol for a graph database, and Naga could execute rules for it

quoll 2020-10-14T13:51:08.150200Z

I thought I would start with Datomic, then implement something for SPARQL, OrientDB… etc

quoll 2020-10-14T13:51:47.150400Z

But I made the mistake of showing my manager, and he got excited, and asked me to develop it for work instead of evenings and weekends. I agreed, so long as it stayed open source, which he was good with

quoll 2020-10-14T13:53:27.150600Z

But then he said that he wanted it to all be open source, and he wasn’t keen on Datomic for that reason. So could I write a simple database to handle it? Sure. I had only stopped working on Mulgara because I don’t like Java, so restarting with Clojure sounded like a good idea (second systems effect be damned!) 🙂

quoll 2020-10-14T13:54:24.150800Z

Initially, Asami only did 3 things: • indexed data • inner joins • query optimizing

simongray 2020-10-14T13:54:46.151Z

hah, ok, so it’s mainly because your manager dislikes closed source software? That is a fantastic 1st world problem to have.

quoll 2020-10-14T13:55:03.151200Z

yup

quoll 2020-10-14T13:55:15.151400Z

But I did it in about a week, so it wasn’t a big deal

simongray 2020-10-14T13:55:20.151600Z

nice

quoll 2020-10-14T13:55:36.151800Z

The majority of that was the query planner

quoll 2020-10-14T13:56:21.152Z

you could argue that it wasn’t needed (Datomic doesn’t have one), but: a) I’d done it before b) rules could potentially create queries that were in suboptimal form. I’ve been bitten by this in the past

quoll 2020-10-14T13:56:44.152200Z

Some time later, he called me and asked me to port it to ClojureScript. So it moved into the browser

quoll 2020-10-14T13:57:07.152400Z

Since then, I’ve been getting more requests for more features. Right now it handles a LOT

quoll 2020-10-14T13:57:31.152600Z

That’s when I started a new pet project (evenings and weekends)

simongray 2020-10-14T13:58:04.152800Z

It seems like a lot of work is happening in this space at the moment with Asami, Datalevin, Datahike, Datascript. Kind of exciting.

quoll 2020-10-14T13:58:17.153Z

This is for backend storage. It is loosely based on Mulgara, but with a lot of innovations, and new emphasis

quoll 2020-10-14T13:58:48.153200Z

Honestly, if I’d known about Datascript (which had started), then I would have just used that

quoll 2020-10-14T13:59:24.153400Z

Anyway… I mentioned the backend storage, and several managers all got excited about it. So THAT is now my job

simongray 2020-10-14T13:59:34.153600Z

HAHA

quoll 2020-10-14T13:59:44.153800Z

And for the first time, they’ve given me someone else to help

quoll 2020-10-14T14:00:29.154Z

He’s doing the ClojureScript implementation (over IndexedDB)

quoll 2020-10-14T14:01:16.154200Z

I’m doing the same thing on memory-mapped files. But it’s behind a set of protocols which makes it all look the same to the index code

quoll 2020-10-14T14:02:12.154400Z

I also hope to include other options, like S3 buckets. These will work, because everything is immutable (durable, persistent, full history, etc)

simongray 2020-10-14T14:02:34.154600Z

Do you see a future where a common protocol like ring can be developed for all of these Datomic-like databases? So much work is happening in parallel.

quoll 2020-10-14T14:02:56.154800Z

That was actually exactly the perspective that Naga has!

quoll 2020-10-14T14:04:14.155Z

The protocol that Naga asks Databases to implement is oriented specifically to Naga’s needs, but it works pretty well

simongray 2020-10-14T14:04:31.155200Z

I see. So perhaps it’s just a question of willingness to integrate.

quoll 2020-10-14T14:06:11.155400Z

Well, the way I’ve done it in Naga has been as a set of package directories which implement the protocol for each database. Unfortunately, I’ve been busy, so I only have directories for Asami and Datomic

quoll 2020-10-14T14:06:18.155600Z

But they both work 🙂

quoll 2020-10-14T14:06:27.155800Z

I imagine that it wouldn’t be hard to do Datascript

quoll 2020-10-14T14:07:09.156Z

The main thing that Datascript/Datomic miss is a query API that allows you to do an INSERT/SELECT (which SPARQL has)

simongray 2020-10-14T14:11:52.156200Z

I need to get some real work done before heading “home” for today, i.e. moving from the desk to the sofa. Thanks for an interesting conversation. I’m keeping an eye on Asami (and now naga). Really interesting projects.

quoll 2020-10-14T14:12:42.156500Z

Thank you

quoll 2020-10-14T14:13:13.157400Z

They look quiet right now because I’m working on the storage branch

2020-10-14T14:57:06.157600Z

@quoll: Sounds like you’ve both had a very interesting career, and currently have a dream job. Most managers would never entertain the need to implement a new database; though it sounds like you’ve done it many times. :thumbsup: @eric.d.scott spoke here a while back about doing something that sounded similar; providing some common abstraction across RDF and other graph stores / libraries. I definitely see the appeal; but I don’t really understand the real world use case. Why is it necessary for your business? Swapping out an RDF database for a different RDF one can be enough work as it is (due to radically different performance profiles), let alone moving across ecosystems. Or am I misunderstanding the purpose of the abstraction; is it to make more backends look like graphs? Which is a use case I totally get 👌. Regardless I’d love to hear more about your work

quoll 2020-10-14T14:57:41.157800Z

only twice: Mulgara and now Asami

1😂
quoll 2020-10-14T14:58:15.158Z

At work, there is no impetus to be able to swap things out 🙂

quoll 2020-10-14T14:58:47.158200Z

but any libraries that use a graph database have motivation to do it

quoll 2020-10-14T14:59:09.158400Z

particularly if the library is supposed to have broader appeal than for just the team developing it

quoll 2020-10-14T14:59:51.158700Z

For instance… there is no need for Asami to have a SPARQL front end, but it’s a ticket, because I’d like to make it more accessible to people

2020-10-14T15:00:16.158900Z

yeah ok that’s fair

quoll 2020-10-14T15:00:23.159100Z

Besides, if I don’t implement a SPARQL front end, it will be embarrassing!!!

2020-10-14T15:00:37.159300Z

lol

quoll 2020-10-14T15:00:50.159500Z

For anyone reading… I was on the SPARQL committee

2020-10-14T15:00:56.159700Z

I don’t know how you could live with yourself… 😆

quoll 2020-10-14T15:01:12.160Z

exactly!

2020-10-14T15:01:23.160200Z

ahh well in that case… I don’t know how you could live with yourself 😁

2020-10-14T15:04:52.160400Z

If you don’t mind me asking, if you could re-live being on that committee, knowing what you do now, what would you do differently?

quoll 2020-10-14T16:12:21.160600Z

Well, it was a learning experience for me. A number of interests were on the committee to push the standard in a direction that most suited their existing systems. So rather than introducing technical changes, or working against specific things, I would have focused more on communication with each member of the committee. Not that I think I did a terrible job, but I could have done better

quoll 2020-10-14T16:19:41.160800Z

From a technical perspective, I would have liked to see a tighter definition around aggregates, with algorithmic description.

quoll 2020-10-14T16:20:48.161Z

But that’s just because I find a bit of flexibility in some of the edge cases there. Also, having a default way to handle things, even if they’re not the ideal optimized approach, would have been nice to have

quoll 2020-10-14T16:21:18.161200Z

That said, that’s essentially what Jena sets out to do. They try to be the reference implementation, and they most certainly don’t take the optimized approach

quoll 2020-10-14T16:22:19.161400Z

The early versions of Jena saved triples as a flat list, and resolved patterns as filters against them 😖

quoll 2020-10-14T16:22:58.161600Z

Andy had some long conversations with me about Mulgara’s storage while he was planning out Fuseki

quoll 2020-10-14T16:26:31.161800Z

Also @rickmoynihan: > Sounds like you’ve both had a very interesting career, and currently have a dream job Yes! I have certainly been spoiled! I honestly don’t know how I have managed to keep coming back to these things, but I’m happy that I have. Of course, I’ve done other things in the between, but even those can be informative (for instance, I’ve had opportunities to work with both Datomic and OrientDB)

quoll 2020-10-14T16:26:58.162Z

Oh! I just thought of something I could have mentioned in the SPARQL committee that continues to frustrate me… transactions!

quoll 2020-10-14T16:28:30.162200Z

It’s possible to send several operations through at once. e.g. An insert; an insert/select; a delete. But there are limits on what you can manage there. There are occasions where transactions are important.

quoll 2020-10-14T16:30:22.162400Z

Datomic is frustrating that way too, because Naga needs it. (I manage it by using a with database, and once I’m done, I replay the accumulated transactions with transact)