rdf

2015-08-27T00:06:50.000093Z

@joelkuiper: this is actually going to change - currently Grafter parses a lang string into a reified object - you can str it to get the string but if you want the tag out you have to (.getLanguage (->sesame-rdf-type my-lang-string-obj)) ... This bit is a bit broken - and we've had a ticket to fix it for a while... it's a pretty simple fix though... The plan is to implement a Literal record type -- so basically a map like @jamesaoverton says -- but with some polymorphic benefits that ensure it can coerce to the sesame (and maybe oneday jena) types properly... it'll have the string itself, the URI type and if its a string have a language keyword set e.g. :en/`:fr` (we use keywords for language tags already and it works well). Right now you can build lang strings with the (s "hola" :es)

2015-08-27T00:09:43.000094Z

@joelkuiper: @jamesaoverton: just reading your discussion -- remember SPARQL 1.1 doesn't really have FULL support for quads... i.e. you can't CONSTRUCT a quad... the pattern in the construct is the GRAPH... I personally think this is a real shame, as there are quad serialisation formats (e.g trig/trix etc...). This might be why you can't just get quads from a model

2015-08-27T00:16:38.000095Z

@joelkuiper: just looking at your type coercion code -- Grafter also has both a Triple record and a Quad record... And I consider this a mistake... one we're going to undo I think you really only want (defrecord Quad [s p o g]) - then create a constructor function for triple which returns a Quad with a nil :g. Otherwise you'll get into a load of bother where #Quad { :s 1 :p 1 :o 1 :g nil } is not= to #Triple {:s 1 :p 1 :o 1}

2015-08-27T00:17:02.000096Z

simply because they're different types

2015-08-27T00:26:24.000097Z

obviously its easy enough to resolve but its a small pain... - yes there are still problems with this model where RDF semantics don't map directly onto clojure value semantics... However we recently had a discussion on the sesame developers mailing list where we convinced the core committer to change sesame's policy to use value semantics when testing equality - rather than RDF style equality where a quad of :s1 :p1 :o1 :g1 .equals a triple of :s1 :p1 :o1. This should be coming in a future release... Not sure what Jena's policy is here

2015-08-27T00:35:49.000099Z

@jamesaoverton: early on in the first version of grafter - because I initially wanted a terse syntax for expressing triple patterns I also chose to represent URI's as strings - as URI's are the primary data type - in Grafter string literals have to built with the s function. - Again this is something I'm going to change -- raw java strings should probably not automatically coerce into RDF - or if they do they should do so to RDF strings in the default language... any java URI type you might reasonably use should probably be made to work.

2015-08-27T00:35:51.000100Z

https://github.com/Swirrl/grafter-url

jamesaoverton 2015-08-27T00:42:34.000102Z

Yeah, well… It’s something I’ve thought a lot about, and in the end I really like working with plain, literal EDN data everywhere I can. For what I do, IRIs are opaque, and I don’t need to get their protocol or query params. So I end up using strings for IRI, keywords for CURIEs/QNames, and maps for Literals.

jamesaoverton 2015-08-27T00:43:23.000103Z

I work with EDN as long as possible, and only convert to other formats at the very end.

2015-08-27T00:46:25.000104Z

yes - records can add noise - but I think you can actually override print-method to print them shorter e.g. you might be able to do #URI "<http://foo.com>"

jamesaoverton 2015-08-27T00:49:35.000105Z

Then you need to provide a reader function to cast the string to that type. I’m glad EDN has typed literals, but I haven’t found that they’re worth the hassle.

2015-08-27T00:50:14.000106Z

yes I know - it definitely adds some friction

jamesaoverton 2015-08-27T00:50:37.000107Z

I think that Transit has a native URI type, which would be more convenient.

2015-08-27T00:51:21.000108Z

ooo interesting

jamesaoverton 2015-08-27T00:51:29.000109Z

Yeah: https://github.com/cognitect/transit-clj

2015-08-27T00:56:32.000111Z

what exactly do you use edn-ld for @jamesaoverton ?

jamesaoverton 2015-08-27T00:58:21.000113Z

The library itself is a recent refactoring of some patterns I’ve developed over the last three years. So I’ve only used that particular code in a few projects so far, but I’ve used its predecessors in a larger number of projects.

jamesaoverton 2015-08-27T00:58:57.000114Z

And although I’m allowed to share those other projects, I’ve never had the time to clean them up and put them on GitHub...

jamesaoverton 2015-08-27T00:59:55.000115Z

But this is an example of some of the stuff that I do: https://github.com/jamesaoverton/MRO

2015-08-27T01:00:58.000117Z

cool

jamesaoverton 2015-08-27T01:01:02.000118Z

The Clojure code takes a table from an SQL database that contains a very dense representation of MHC class restrictions, AKA some biology stuff.

jamesaoverton 2015-08-27T01:02:12.000119Z

The goal is to convert that table into an OWL ontology. The ontology has several branches, with specific relationships.

jamesaoverton 2015-08-27T01:02:52.000120Z

There’s an Excel spreadsheet that specifies templates for different branches at different levels.

jamesaoverton 2015-08-27T01:03:25.000121Z

Then I read the source table and the template table, and zip them together into a sequence of maps defining OWL classes.

jamesaoverton 2015-08-27T01:03:45.000122Z

Finally, I convert that EDN data into RDFXML file.

2015-08-27T01:04:35.000123Z

what makes it EDN, rather than just CLJ? :simple_smile:

jamesaoverton 2015-08-27T01:04:57.000124Z

There are really two parts. The first is ripping the source table into a number of branch-specific tables. Then I use a Java tool I wrote ROBOT to convert those tables to OWL.

jamesaoverton 2015-08-27T01:05:18.000125Z

It’s not the best example, but it’s on GitHub.

jamesaoverton 2015-08-27T01:05:56.000126Z

To answer your question: I’m pretty convinced by this "Data are better than Functions, are better than Macros” thing that Clojure people talk about.

jamesaoverton 2015-08-27T01:06:58.000127Z

The MRO project doesn’t use the EDN-LD library because it’s for OWL and not just RDF. I haven’t figured out a general way to describe OWL in EDN, but I’ve been talking to Phil Lord about it.

2015-08-27T01:07:26.000128Z

yeah Phil and I have spoken in the past too

2015-08-27T01:09:28.000129Z

what in the MRO example is data?

2015-08-27T01:09:46.000130Z

thats not functions/macros/ just general clojure

jamesaoverton 2015-08-27T01:10:45.000131Z

The source table from SQL, and the Excel spreadsheet under src/mro. Those are converted to all the branch-specific CSV files at the top level.

2015-08-27T01:12:21.000132Z

sorry I was meaning where is the data - in the EDN-LD Data > Functions > Macros, sense - presumably by that you meant that EDN-LD represents transformations by clojure data? Not symbols/functions/macros

jamesaoverton 2015-08-27T01:13:23.000135Z

The previous version of the MRO code had a separate function for each level of each branch.

jamesaoverton 2015-08-27T01:14:50.000136Z

EDN-LD is mostly just conventions for representing RDF in EDN, and then some functions for working with those representations.

2015-08-27T01:15:04.000137Z

and now you have a map - essentially in place of a cond?

jamesaoverton 2015-08-27T01:16:56.000138Z

In the MRO example, there’s a sequence of maps representing templates, and a sequence of maps from the source table (SQL). Then the smarts are in the apply-template function, that applies each template to each row of the source table.

jamesaoverton 2015-08-27T01:17:18.000139Z

So there’s a smaller number of higher-level functions, in the end, and I find it easier to reason about.

2015-08-27T01:18:01.000140Z

for what its worth - your MRO code seems broadly similar to grafter pipelines... In that you have a sequence of rows which you effectively process in row form... and then templatize. Is that fair?

2015-08-27T01:18:13.000141Z

oh sorry you just said that

jamesaoverton 2015-08-27T01:18:40.000142Z

I agree with that.

2015-08-27T01:18:53.000143Z

Grafter basically works the same

jamesaoverton 2015-08-27T01:20:28.000144Z

In the MRO case, the Clojure code is table-to-table, then ROBOT (my Java tool) is used for the table-to-OWL part.

jamesaoverton 2015-08-27T01:21:00.000145Z

At the end of the day, pretty much all the code I write is a pipeline. :^)

2015-08-27T01:21:30.000146Z

cool

2015-08-27T01:21:46.000147Z

same for a lot of the stuff we do

jamesaoverton 2015-08-27T01:21:58.000148Z

Some day I’ll publish a cleaner example :^)

2015-08-27T01:22:02.000149Z

that and tools around them

2015-08-27T01:22:06.000150Z

lol - same

jamesaoverton 2015-08-27T01:22:41.000151Z

You made a good point about Quad equality above. I’ll think more about that.

jamesaoverton 2015-08-27T01:22:55.000152Z

It was good talking, but I’ve got to go now.

jamesaoverton 2015-08-27T01:23:07.000153Z

Later!

2015-08-27T01:23:21.000154Z

cool

2015-08-27T01:23:22.000155Z

night

joelkuiper 2015-08-27T08:24:00.000156Z

so as far as I’m aware off there’s no real way to use SPARQL 1.1 to get Quads, but there might be in the future, so I’ll just leave it nil I guess.

joelkuiper 2015-08-27T08:25:48.000157Z

As far as type/data coercion … well I don’t really want to invent another class/type model for RDF. So I’ve chosen to represent results/triples as simple maps and records with strings for URI’s and json-ld-isch maps as best I can for the rest. If that’s not your cup of tea you can always just use the Jena objects 😉 and forget about the lazy-seq stuff 😛

joelkuiper 2015-08-27T08:28:15.000159Z

If commonsRDF solves this problem I might consider implementing that, but for now it’s just too much of a mess to match the RDF semantics to Clojure, and the simplest thing I could think of was {:type “typeURI” :lang “@lang” :value “Jena coerced POJO”}

joelkuiper 2015-08-27T08:29:01.000160Z

or a string for uri, I may consider wrapping that in a java.net.URI. though, bit unsure still

2015-08-27T08:56:46.000162Z

@joelkuiper: I'd be tempted to go with a record for Quads and Literals... it makes writing and extending coercions easier (admitedly you can use a multimethod for this too -- but you'll probably just end up dispatching on type anyway (and you can always use a multimethod on a record too if you want)... Also multimethod dispatch is quite a bit slower than record dispatch... and you'll probably end up dispatching on millions of quads

2015-08-27T08:57:06.000163Z

when users come to process results

joelkuiper 2015-08-27T08:58:07.000164Z

well, Triples 😛 since there’s no real way of getting Quads 😉

joelkuiper 2015-08-27T08:59:04.000165Z

So a Literal of [type, value, lang] -> [String, Object, Keyword] or something?

2015-08-27T09:01:50.000166Z

type => String, value => String, lang => Keyword

joelkuiper 2015-08-27T09:02:47.000167Z

why value as a string?

2015-08-27T09:03:02.000168Z

there might not be a way to query for a Quad -- but I think on the processing side it makes sense to have a quad -- because you can set it to non-nil yourself and serialise nquads etc easier

joelkuiper 2015-08-27T09:03:25.000169Z

Jena has excellent support for making sense of a lot of the XSD types into java objects

2015-08-27T09:03:55.000170Z

ahh ok sorry - by Object you mean Integer/Float/Double/Date etc...

joelkuiper 2015-08-27T09:03:57.000171Z

that’s a fair point

2015-08-27T09:03:58.000172Z

then yes I agree

joelkuiper 2015-08-27T09:04:02.000173Z

yep :simple_smile:

2015-08-27T09:04:14.000174Z

definitely coerce the types out where you can

2015-08-27T09:04:23.000175Z

but where you can't you'll need to fall back to string

joelkuiper 2015-08-27T09:04:29.000176Z

right, that’s what I do now

2015-08-27T09:04:44.000177Z

thats what we're doing with grafter

joelkuiper 2015-08-27T09:05:23.000178Z

yeah I saw that :simple_smile:

2015-08-27T09:05:26.000179Z

did you read the stuff I wrote here last about Triple/Quad equality etc?

joelkuiper 2015-08-27T09:05:58.000180Z

yup, interesting stuff; I’ll probably change it to Quad for those reasons. Makes sense

2015-08-27T09:06:19.000181Z

Its definitely a trade off -- but I think its the better one

joelkuiper 2015-08-27T09:06:58.000182Z

could also just use a map I guess

2015-08-27T09:08:41.000183Z

yes but it'll have the same issues -- i.e. (= {:s :s1 :p :p1 ::o o1 :g nil} {:s :s1 :p :p1 ::o o1}) =&gt; false

joelkuiper 2015-08-27T09:09:33.000184Z

yeah, that’s true. it’s a silly problem 😛

2015-08-27T09:10:01.000185Z

its not a big deal - its just annoying -- and can cause hard to find bugs

joelkuiper 2015-08-27T09:11:43.000186Z

it’s one of those things that would be easy enough to solve with a custom Equals method though

2015-08-27T09:15:27.000187Z

yes but I think its more pragmatic to retain value semantics

2015-08-27T09:15:50.000188Z

even in java

joelkuiper 2015-08-27T09:17:51.000189Z

I’ve gone back and forth on that topic in Java projects; either can create hard to find bugs, especially if done inconsistently across developers 😛

2015-08-27T09:25:49.000190Z

yeah it definitely depends on what you're doing

2015-08-27T09:26:10.000191Z

but I think programming with values is generally better

2015-08-27T10:21:51.000192Z

@joelkuiper: any reason to use "@en" strings rather than :en keywords for language tags - (I know obviously that SPARQL and various serialisations represent them that way...

2015-08-27T10:22:25.000193Z

keywords share memory when you have lots of them

joelkuiper 2015-08-27T11:14:45.000194Z

no strong opinion, it’s closer to JSON-LD

joelkuiper 2015-08-27T11:14:46.000195Z

which is nice

joelkuiper 2015-08-27T11:33:22.000196Z

switched it to keywords 😉, probably the last I’ll work on it for the week at least!

2015-08-27T14:02:04.000197Z

cool

quoll 2015-08-27T15:08:14.000201Z

I want to think on it some more, but I agree that we should have: (not= {:s :s1 :p :p1 :o o1 :g nil} {:s :s1 :p :p1 :o o1})

quoll 2015-08-27T15:08:26.000202Z

rather than a custom = function, I’d like to see another function that explicitly calls out that it’s handling some kind of equivalence instead

quoll 2015-08-27T15:09:11.000204Z

such as: (equiv {:s :s1 :p :p1 :o o1 :g nil} {:s :s1 :p :p1 :o o1})

2015-08-27T15:57:46.000206Z

I personally think its better to have one type - even if it has a nil field a lot of the time instead of two - for essentially the same thing

2015-08-27T15:59:01.000207Z

I think its a good idea to have a custom equivalence function that implements RDF semantics

2015-08-27T15:59:53.000208Z

so the not='s case won't arise in normal usage

quoll 2015-08-27T16:00:24.000209Z

on the second point, yes. Clojure needs to have = semantics that are separate to what is needed for RDF

quoll 2015-08-27T16:01:23.000210Z

for instance, I want to be able to say things like: (matches {:s s1 :p p1 :o o1} {:s s1 :p p1 :o o1 :g g1})

quoll 2015-08-27T16:02:03.000211Z

because the triple in the first arg does match the triple-in-a-graph found in the second arg

2015-08-27T16:02:41.000212Z

quol - I think the best thing is to have a Quad record -- with a triple constructor - that essentially returns you a nil in the :g

2015-08-27T16:03:15.000213Z

so (matches (triple :s1 :p1 :o1) (quad :s1 :p1 :o1 :g1) =&gt; true

quoll 2015-08-27T16:03:29.000214Z

it’ll depend on usage. I’ve never needed quads, except when storing multiple graphs in a single file

quoll 2015-08-27T16:03:45.000215Z

I’m a “triples” person myself :simple_smile:

2015-08-27T16:04:39.000216Z

we use both 50/50 - one representation simplifies things for everyone... if you don't care about the nil :g - you don't need to...

quoll 2015-08-27T16:04:55.000217Z

when I say “storing”, I also mean “loading”, since you get quads back when you read, and they need to go to various graphs

2015-08-27T16:05:00.000218Z

the Quad record will seamlessly coerce into a sesame/jena triple/quad resepectively

2015-08-27T16:06:22.000219Z

yes -- we use quads a lot -- because most of our work is writing pipelines that generate RDF... and we usually want to derive the graph from the data we're loading in

quoll 2015-08-27T16:06:52.000220Z

and you’re working with multiple graphs at once?

2015-08-27T16:06:56.000221Z

yes

2015-08-27T16:07:21.000222Z

the fact you can't in other tools is one reason we created grafter

2015-08-27T16:07:43.000223Z

we have tens of thousands of graphs

quoll 2015-08-27T16:07:56.000224Z

ah. You’re one of those :simple_smile:

2015-08-27T16:08:44.000225Z

we manage lots of data for many customers

2015-08-27T16:09:23.000226Z

so a lot of the time its out of our hands

2015-08-27T16:10:09.000227Z

graphs are also very useful for managing data

quoll 2015-08-27T16:10:19.000228Z

most RDF stores are optimized around triples, and then group statements into graphs. Those that treat graphs as an equal part of the quad take a small performance hit, and it often seems unjustified given that SPARQL treats graphs so differently

quoll 2015-08-27T16:10:34.000229Z

yes, I completely agree that graphs are great that way

2015-08-27T16:11:54.000230Z

@quoll: having used fuseki, sesame, stardog, bigdata and graphdb/owlim I can say that statements not true in my experience

2015-08-27T16:12:39.000231Z

on many stores you have to use graphs to get acceptable performance

quoll 2015-08-27T16:13:59.000234Z

I may not have been clear in what I was trying to say

2015-08-27T16:14:05.000235Z

I agree thats its unfortunate SPARQL only half implements graphs though

quoll 2015-08-27T16:17:27.000236Z

when RDF stores are storing data on disk, many of them will use a scheme that is based around subject/predicate/object. Graphs then get implemented as a separate structure (e.g. separate index files, or an index that refers to statements as a group, but not allowing arbitrary selection of subject/predicate/object/graph as single step index lookups).

quoll 2015-08-27T16:17:49.000237Z

Some stores do allow arbitrary lookup for quads

quoll 2015-08-27T16:17:58.000238Z

but then SPARQL hamstrings it

quoll 2015-08-27T16:19:26.000239Z

I mean, you can still work with it, but SPARQL presumes that you’ll be selecting only a couple of graphs, and working with triples from them. The syntax gets messier if you treat graphs as just another element of the quad

quoll 2015-08-27T16:19:57.000240Z

ironically, the stores that index symmetrically on the quad can handle the operations just fine. It’s SPARQL syntax that gets in the way

quoll 2015-08-27T16:20:25.000241Z

but because of this bias, many stores don’t index symmetrically around the quad

quoll 2015-08-27T16:20:54.000242Z

that’s usually OK, because many applications don’t ask for lots of graphs like that

quoll 2015-08-27T16:21:10.000243Z

but some do…. hence my statement that you’re “one of those:simple_smile:

2015-08-27T16:21:50.000244Z

@quoll: yes you're right -- sorry was missunderstanding what you were saying... Yes that's definitely true... Graph performance can be spotty on some stores... I know - because we have some automatically generated queries which have well over 1000 graph clauses

2015-08-27T16:23:21.000245Z

but we actually sell a linked data management platform -- so its unavoidable -- we frequently push the limits and assumptions of every triple store

quoll 2015-08-27T16:24:20.000246Z

I can’t recall now which stores index symmetrically around quads. I know ours does, but it’s in dire need of some love, and doesn’t even handle SPARQL 1.1 (i.e. indexing is great, but query/update functionality is not)

quoll 2015-08-27T16:27:01.000247Z

I think that the default indexing in Jena is symmetric

quoll 2015-08-27T16:27:19.000248Z

I should ask Mike about Stardog though

quoll 2015-08-27T16:27:56.000249Z

I’ve never contributed to the internals of Stardog (for obvious reasons). And the Clojure adapter was just a client

2015-08-27T16:28:09.000250Z

I'm guessing stardog does

quoll 2015-08-27T16:28:24.000252Z

I thought it did

quoll 2015-08-27T16:28:32.000253Z

I can ask… hang on

2015-08-27T16:28:59.000254Z

what store do you work on?

quoll 2015-08-27T16:29:45.000255Z

Mulgara

quoll 2015-08-27T16:29:51.000256Z

or rather… I did

quoll 2015-08-27T16:30:00.000257Z

I’ve been busy 😕

2015-08-27T16:30:50.000258Z

ahh yes I've been to this site before! :simple_smile:

quoll 2015-08-27T16:31:28.000259Z

Well… busy life, plus the fact that I’d been on it for over a decade. I’ve been trying new things lately

2015-08-27T16:32:12.000260Z

ahh you're the guy that implemented an RDF store on Datomic... I had that same thought the moment Rich released it... How did it go?

quoll 2015-08-27T16:32:43.000261Z

it’s been good, though I put it aside for other stuff. I’m trying to pick it back up again actually

quoll 2015-08-27T16:33:05.000262Z

Datomic is implemented in a very similar way to Mulgara’s indexes (persistent trees), so it seemed natural to me

quoll 2015-08-27T16:33:53.000263Z

OK, Al doesn’t know. He said I should ask Mike directly :simple_smile:

quoll 2015-08-27T16:34:41.000264Z

Mike is fun to talk to about this stuff, but I only have him on email, not IM :simple_smile:

2015-08-27T16:35:33.000265Z

Yes Mike and I have exchanged emails...they have a gitter channel now

2015-08-27T16:37:13.000266Z

what datomic schema does kiara use?

2015-08-27T16:37:41.000267Z

does it implement a schema for triples/literals - or does it somehow use vocabularies for a datomic schema?

quoll 2015-08-27T16:39:33.000269Z

literals are done in 2 ways

quoll 2015-08-27T16:41:07.000270Z

if they’re simple text or using one of a few xsd datatypes then they’re stored as native values (strings, longs, doubles, floats, dates, URIs)

quoll 2015-08-27T16:42:09.000271Z

anything else, and they become a structure with properties for value (a string) and datatype (a URI, since there aren’t any IRIs in xsd datatypes)

quoll 2015-08-27T16:42:58.000272Z

RDF properties get scanned for the values that they refer to, and the most general type required is found

quoll 2015-08-27T16:43:58.000273Z

this is because if you have a property of my:value and it refers to a xsd:long, then it’s a very rare schema that requires that property to also refer to a string, or something else

2015-08-27T16:44:52.000274Z

yes I'd say thats a fair assumption

quoll 2015-08-27T16:45:10.000275Z

but if that DOES happen, then the type for the property in the Datomic schema is set to refer to a structure, and that structure then refers to the final value, using different property names for each type

quoll 2015-08-27T16:45:23.000276Z

that’s a corner case, but it makes querying more complex 😕

2015-08-27T16:45:33.000277Z

no shit :simple_smile:

quoll 2015-08-27T16:45:36.000278Z

😄

quoll 2015-08-27T16:46:12.000279Z

I think I need to change how subjects work though

2015-08-27T16:46:46.000280Z

whats the performance on datomic like?

2015-08-27T16:47:08.000281Z

is there any hope of it being competitive?

quoll 2015-08-27T16:47:25.000282Z

for now, if they’re IRIs then I convert to QNames (ruthlessly, if necessary) :simple_smile: then convert the QNames to keywords and use those as the entity IDs. This works, but it uses RAM.

quoll 2015-08-27T16:47:52.000283Z

I have not pushed it to big datasets yet

quoll 2015-08-27T16:48:40.000284Z

Most of the big sets are in RDF/XML (which I despise), and I really want to avoid Jena (I love those guys, but Jena is bloated), so I’ve started on an RDF/XML parser in Clojure

quoll 2015-08-27T16:49:01.000285Z

I have a decent Turtle parser though, and that seems OK

2015-08-27T16:49:06.000286Z

cool

quoll 2015-08-27T16:49:07.000287Z

but I haven’t loaded anything really big through hit

2015-08-27T16:49:12.000288Z

does it work with large files?

quoll 2015-08-27T16:49:49.000289Z

that’s another thing. Datomic recommends that you don’t try to do really big loads. They recommend chunking it up. That’s easy in Turtle, but not so much with RDF/XML

2015-08-27T16:50:19.000290Z

Jena do a good job - if you want a standards compliant, free store... but yeah the codebase is a mess... Sesame's code is so much better to work with

quoll 2015-08-27T16:50:20.000291Z

besides that, I hate the idea of multiple transaction points at arbitrary locations in a load. But it’s pragmatic, so I guess I need to

quoll 2015-08-27T16:50:33.000292Z

yes, I’ve contributed to Jena

2015-08-27T16:51:05.000293Z

@quoll: yeah chunking sucks

quoll 2015-08-27T16:51:45.000294Z

Mulgara is actually faster if you don't

quoll 2015-08-27T16:52:16.000295Z

annoyingly people would chunk their data, and then get annoyed at Mulgara for performing badly

quoll 2015-08-27T16:52:53.000296Z

but every chunk becomes a new transaction, which means that it requires a new root to the persistent tree

quoll 2015-08-27T16:53:14.000297Z

if you load 1M triples, then you just have a simple tree

2015-08-27T16:53:41.000298Z

so I'm guessing you need to reindex if that happens

quoll 2015-08-27T16:54:06.000299Z

if you load 100K triples 10 times, then you end up with most of the nodes in the first tree being duplicated while inserting the second 100K, and so on for each chunk

quoll 2015-08-27T16:54:23.000300Z

actually, Mulgara does not do background indexing (which is something I started work on, but never finished)

quoll 2015-08-27T16:54:38.000301Z

so when it’s finished loading, it’s fully available

quoll 2015-08-27T16:54:46.000302Z

but that makes loading slower

quoll 2015-08-27T16:55:49.000303Z

Stardog, for instance, loads immediately into a linear file, and then moves those triples (or quads) into the indexes in the background. Querying looks in both the indexes (fast) and the linear file (slow).

quoll 2015-08-27T16:56:04.000304Z

So loads are lightning fast, but querying sucks for a while

quoll 2015-08-27T16:56:20.000305Z

the longer you wait, the faster the querying gets

quoll 2015-08-27T16:57:42.000306Z

anyway, Mulgara isn’t as complex, but it does not need reindexing

quoll 2015-08-27T17:00:53.000307Z

Just got a response on twitter: yes, Stardog is symmetrically indexed (I thought it was)

2015-08-27T17:01:43.000308Z

thats interesting

ricroberts 2015-08-27T17:09:47.000310Z

hello.

2015-08-27T17:09:50.000311Z

Welcome @ricroberts

joelkuiper 2015-08-27T17:13:53.000312Z

Hey :simple_smile:

wagjo 2015-08-27T17:37:59.000314Z

Hi

joelkuiper 2015-08-27T17:38:48.000315Z

Hey! Thought you’d might also be interested in this channel, we’ve also been discussing some YeSPARQL related things :simple_smile:

wagjo 2015-08-27T17:39:22.000316Z

Definitely! Thanks for inviting me.