Are there any docs on tabling in core.logic and how it works?
Backstory: I’ve implemented a SPARQL/Datalog-like query DSL for querying in memory RDF graphs with core.logic. The DSL essentially tries to hide a lot of the minikanren/relational-programming stuff and present a simpler “SPARQL like” model, where you don’t need to fresh
variables etc…
I use db-rel
to get acceptable performance. Currently I compile the Basic Graph Pattern vectors my macro receives and convert them to calls to db-rel
. What I have works quite well, but it does of course mean that you can’t generate queries at runtime. I’d like to explore doing the same but with functions, so I can also provide a way to build tabled queries dynamically.
Is this possible?
For reference this is the project: https://github.com/Swirrl/matcha
e.g. the graph pattern [[:foo ?p ?o]]
would currently compile into something like:
(clojure.core.logic/fresh
[?p ?o]
(grafter.matcha.alpha/triple ?s ?p ?o))
I can obviously define a relation that will recursively apply the triple
relation, but won’t that eliminate the advantages of indexing?
I’m also not entirely sure of all the semantics around using clojure inside relations etc, which was my first instinct to avoid using a macro.
@rickmoynihan - sounds like an interesting project - It would be great to have something to query in-memory graph - without digging into minikanren
I didn't quite follow what you mean by generating queries dynamically
I recently also used core.logic for graph traversal - but due to large size of the db it wasnt as efficient
Sometimes you want to generate queries dynamically based on inputs.
Currently in Matcha you can’t do this kinda thing: (let [query-pattern (conj '[?s ?p] '?o)] ((select query-pattern) friends)
because select
is a macro, so query-pattern
has to be hardcoded as a literal value at compile time
ok I see what you mean - how does tabling help in this scenario?
it makes it orders of magnitude faster
and you want to use functions in place of macro
sorry - just trying to understand the problem 🙂 -
I don't know if it is applicable in this particular case - but for my particular use case of making graph traversal faster on a very large in-memory db - I went about re-writing some parts of pldb and db-rel
making the matching logic parallel - which seemed to work better than tabling
as issue for me wasn't that query was diverging
> and you want to use functions in place of macro @bajrachar Yes.
I think the problem is that without tabling I’d need to scan the database on each join, so performance becomes exponential?/quadratic?/something-really bad. Making it parallel might help, but at best parallelisation will only make linear gains in terms of processors; rather than the orders of magnitude gains you can get through indexing/tabling/memoization/better-algorithms. For me querying just 1000 edges was taking my naive version longer than querying millions of edges with tabling
sounds interesting - definitely would like to see how you've used tabling
by join - I assume its traversing the graph to depth n?
I mean join in the RDBMS sense. Effectively a triple store is a single table with three columns; every time you walk from one edge (row) to another through a ?variable
binding you’re effectively doing a self join back on that table. But yes each traversal is effectively another join.
some references: http://scholarworks.iu.edu/dspace/bitstream/handle/2022/8777/Byrd_indiana_0093A_10344.pdf?sequence=1 http://code.google.com/p/iucs-relational-research/
I think I understand roughly how this works; but need to figure out the finer details
the second link in the source code 404's 😞
I think the first link is Byrds thesis… I’ve skimmed this before; but can’t recall the details of tabling
hmmm… maybe I’m misunderstanding the relationship between tabling and pldb