datomic

Ask questions on the official Q&A site at https://ask.datomic.com!
2020-09-21T02:28:54.030900Z

thanks, that could make sense

2020-09-21T02:29:04.031100Z

I believe I have a few run away string values

2020-09-21T02:33:23.031300Z

does retraction affect the storage of large strings in the segments or do those stay for good?

jeff tang 2020-09-21T03:19:35.031700Z

does the not= predicate work for datalog queries? e.g.

(d/q '[:find ?uid ?order
            :in $ ?parent-eid [?source-uids ...]
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            [?ch :block/order ?order]
            [(= ?order ?source-uids)]]
          @db/dsdb 48 #{0 1 2})
works but
(d/q '[:find ?uid ?order
            :in $ ?parent-eid [?source-uids ...]
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            [?ch :block/order ?order]
            [(not= ?order ?source-uids)]]
          @db/dsdb 48 #{0 1 2})
does not work to elaborate, = works for both value and collection comparisons, whereas not= only seems to work for value comparisons

2020-09-21T14:21:36.033900Z

there's a datalog specific not impl here https://docs.datomic.com/on-prem/query.html#not-caluses

favila 2020-09-21T14:23:08.034800Z

This still isn’t what I expect, but note that in datalog it’s more idiomatic to use = and !=

favila 2020-09-21T14:23:20.035200Z

not= is clojure.core/not=, but those two are not necessarily clojure’s

favila 2020-09-21T14:23:24.035600Z

Also, why not this?

favila 2020-09-21T14:23:47.035800Z

(d/q '[:find ?uid ?order
            :in $ ?parent-eid [?source-uids ...]
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            [?ch :block/order ?source-uids]]
          @db/dsdb 48 #{0 1 2})

favila 2020-09-21T14:23:55.036200Z

or this for the negation?

favila 2020-09-21T14:24:10.036600Z

(d/q '[:find ?uid ?order
            :in $ ?parent-eid [?source-uids ...]
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            (not [?ch :block/order ?source-uids])]
          @db/dsdb 48 #{0 1 2})

favila 2020-09-21T14:24:42.037Z

or, if you want to keep a set:

favila 2020-09-21T14:24:46.037200Z

(d/q '[:find ?uid ?order
            :in $ ?parent-eid ?source-uids
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            [?ch :block/order ?order]
            [(contains? ?source-uids ?order)]]
          @db/dsdb 48 #{0 1 2})

favila 2020-09-21T14:25:38.037800Z

(which is faster in some cases)

jeff tang 2020-09-21T16:02:09.048800Z

@favila your first two codeblocks make sense to me! I tried your third codeblock earlier (`contains?`) but datascript didn't recognize my custom predicate for negation. It was fully qualified but idk

favila 2020-09-21T16:02:27.049Z

oh this is datascript?

jeff tang 2020-09-21T16:04:18.049200Z

yeah, not sure if custom predicates are different in that case

arohner 2020-09-21T09:40:48.032900Z

Is there a way to get a ‘projection’ of a database? For authZ purposes, I would like to run queries on a db that only contains the set of datoms that were returned from a query

cjsauer 2020-09-22T17:00:13.081100Z

@steveb8n interesting. With what data does your middleware stack work with? Are the filters queries themselves, the results of which are then used as input to the next query, and so on? It seems with the client api you could end up performing large scans of the database if your filter was relatively wide...

cjsauer 2020-09-22T17:01:04.081700Z

(I think this may be part of the reason why the client api doesn’t support filter)

steveb8n 2020-09-22T22:52:18.083200Z

I went all out and added Pedestal interceptors in the proxy. The enter fns decorate the queries before execution with extra where clauses. In that way you can 1/ limit access 2/ maintain good performance

steveb8n 2020-09-22T22:52:53.083400Z

doesn’t work for d/pull so, in that case, I filter the data in the leave fn instead

steveb8n 2020-09-22T22:53:25.083600Z

for writes, the enter fns check that all references can be read using the same filters as the reads

cjsauer 2020-09-23T23:04:59.014700Z

Ah clever, that’s a great use of queries as data. I can see how you could have a toolbox of interceptors for common things like [?e :user/id ?user-id-from-cookie]

cjsauer 2020-09-23T23:29:58.017200Z

Thinking more, you could even build :accessible/to into the schema, and assert it onto entities to authorize access by the referenced entity (ie a user). That might be generalized into an interceptor more gracefully.

steveb8n 2020-09-24T02:29:49.017800Z

exactly. almost anything can be generalised with this design. It’s non-trivial but worth it imho

arohner 2020-09-24T11:02:20.001500Z

Where are the API docs for the proxy? I’m not finding anything

steveb8n 2020-09-24T22:54:44.005300Z

There aren’t any docs. This technique relies upon undocumented (i.e. unsupported) use of the api client protocols. you can see an example of this here https://github.com/ComputeSoftware/datomic-client-memdb/blob/master/src/compute/datomic_client_memdb/core.clj

steveb8n 2020-09-24T23:13:27.005600Z

in the (unlikely) event that Cognitect changes these protocols, you can always refactor using this technique (which is where I first tried the middleware idea) https://github.com/stevebuik/ns-clone

cmdrdats 2020-09-21T11:01:15.033300Z

You could excise the values? That should get rid of them

2020-09-21T14:19:32.033700Z

I'm going to try excising, though I remember reading that it's not made to reduce the size of stored data necessarily

2020-09-21T14:21:36.033900Z

there's a datalog specific not impl here https://docs.datomic.com/on-prem/query.html#not-caluses

Giovani Altelino 2020-09-21T14:22:12.034200Z

You could to use an or-join

2020-09-21T14:22:38.034400Z

We use the transaction report queue to push data into a kinesis stream, then run lambdas on those events

favila 2020-09-21T14:23:08.034800Z

This still isn’t what I expect, but note that in datalog it’s more idiomatic to use = and !=

favila 2020-09-21T14:23:20.035200Z

not= is clojure.core/not=, but those two are not necessarily clojure’s

2020-09-21T14:23:21.035400Z

Triggering side effects on Dynamo db writes is likely not what you want since datomic is writing full blocks to storage (not a datom at a time)

favila 2020-09-21T14:23:24.035600Z

Also, why not this?

favila 2020-09-21T14:23:47.035800Z

(d/q '[:find ?uid ?order
            :in $ ?parent-eid [?source-uids ...]
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            [?ch :block/order ?source-uids]]
          @db/dsdb 48 #{0 1 2})

favila 2020-09-21T14:23:55.036200Z

or this for the negation?

favila 2020-09-21T14:24:10.036600Z

(d/q '[:find ?uid ?order
            :in $ ?parent-eid [?source-uids ...]
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            (not [?ch :block/order ?source-uids])]
          @db/dsdb 48 #{0 1 2})

favila 2020-09-21T14:24:42.037Z

or, if you want to keep a set:

favila 2020-09-21T14:24:46.037200Z

(d/q '[:find ?uid ?order
            :in $ ?parent-eid ?source-uids
            :where
            [?parent-eid :block/children ?ch]
            [?ch :block/uid ?uid]
            [?ch :block/order ?order]
            [(contains? ?source-uids ?order)]]
          @db/dsdb 48 #{0 1 2})

favila 2020-09-21T14:25:38.037800Z

(which is faster in some cases)

favila 2020-09-21T14:26:45.038Z

It’s not, but if you have a too-large value that’s the only way to ensure it’s not written to segments again

1👍
favila 2020-09-21T14:27:31.038400Z

It’s actually OK to have item-too-large occasionally. All this means is that the item will be fetched from storage instead of memcache/valcache

favila 2020-09-21T14:27:38.038600Z

it will still be kept in object cache

favila 2020-09-21T14:27:56.038800Z

that said, there’s a reason they say to keep strings under 4k

Giovani Altelino 2020-09-21T14:29:59.039Z

[:find ?owner-name ?pet-name
 :with ?data-point
 :where [?owner :owner/name ?owner-name]
        [?owner :owner/pets ?pet]
        [?pet :pet/name ?pet-name]
 (or-join [?owner-name ?pet-name ?data-point]
   (and [(identity? ?owner-name) ?data-point)
   (and [(identity? ?pet-name) ?data-point)]

Giovani Altelino 2020-09-21T14:30:37.039200Z

I guess something like this should work, although I don't have datomic installed right now to confirm

cjsauer 2020-09-21T14:37:40.040300Z

I know on-prem has something like this via d/filter but Cloud does not afaik

arohner 2020-09-21T14:45:25.040500Z

thanks

souenzzo 2020-09-21T15:12:30.040700Z

@bhurlow when running on multiple/scalled instances, how do you manage the tx-report-queue?

2020-09-21T15:13:07.040900Z

We run a single, global process which just subscribes to the queue and pushes events to kinesis

2020-09-21T15:13:20.041100Z

other Datomic traffic is scaled horizontally but doesn't invoke the queue

2020-09-21T15:13:52.041300Z

Kinesis -> Lambda integration works reasonably well

2020-09-21T15:14:17.041500Z

one bonus is you can do one queue to many lambda consumers

souenzzo 2020-09-21T15:14:36.041700Z

@bhurlow can you share which instance size you use for this report-queue?

2020-09-21T15:15:40.041900Z

subscribing to the tx report queue and putting into lambda is not a very intensive process

2020-09-21T15:15:48.042100Z

t3.large would be fine imo

1🦜
2020-09-21T15:17:16.042400Z

thanks. In this case the data size was by accident

souenzzo 2020-09-21T15:19:04.042700Z

tnks @bhurlow

joshkh 2020-09-21T15:28:31.042900Z

hmm, i don't suppose you know if something like the "transaction report queue" is available on Datomic Cloud, do you? i have often been in need of exactly what souenzzo mentioned, but instead settled for querying / sipping the transaction log on a timer

2020-09-21T15:34:13.043100Z

I'm not sure about cloud, have only used the above in on-prem

2020-09-21T15:34:22.043300Z

I'd assume it's inside the system but possibly not exposed

val_waeselynck 2020-09-21T15:37:31.043500Z

Clients don't have a txReportQueue indeed. Polling the Log is usually fine IMO (and having a machine dedicated solely to pushing events seems wasteful, and it's also fragile as it creates a SPoF).

souenzzo 2020-09-21T15:38:55.043700Z

I work with datomic cloud and datomic on-prem (on different products) IMHO, datomic on-prem still way easier/flexible then cloud Cloud has to many limitations. You can't edit IAM for example, and if you edit, you break any future updates.

1👍
joshkh 2020-09-21T15:40:07.044Z

thanks guys

val_waeselynck 2020-09-21T15:42:25.045900Z

One interesting construction might be using AWS Step Functions + Lambda for polling the Datomic Log into Kinesis, using the Step Functions state to keep track of where you are in consuming the Log

joshkh 2020-09-21T15:49:06.048600Z

is there a more efficient way to find all entities Y with any tuple attribute that references X?

(d/q '{:find  [?tuple-entity]
       :in    [$ ?target-entity]
       :where [[?tuple-attr :db/valueType :db.type/tuple]
               [?tuple-entity ?tuple-attr ?refs]
               [(untuple ?refs) [?target-entity ...]]]}
     db entity-id)
this runs in around ~500ms given a few hundred thousand ?tuple-entitys which isn't too slow for its purpose, but i am worried that it won't scale with my data

jeff tang 2020-09-21T16:02:09.048800Z

@favila your first two codeblocks make sense to me! I tried your third codeblock earlier (`contains?`) but datascript didn't recognize my custom predicate for negation. It was fully qualified but idk

favila 2020-09-21T16:02:27.049Z

oh this is datascript?

jeff tang 2020-09-21T16:04:18.049200Z

yeah, not sure if custom predicates are different in that case

Joe Lane 2020-09-21T17:13:28.051700Z

@joshkh What problem do you have that necessitate that kind of schema structure?

joshkh 2020-09-21T17:16:26.052200Z

i knew someone would ask that 😉

joshkh 2020-09-21T17:35:52.053300Z

i'm working with one database that has been modelled in such a way that entities with tuple attributes that are unique are no longer "valid" when any one of their tuple reference values is retracted. one drawback to having unique tuples is that you can end up with {:enrollment/[player+server+board [p1 s1 nil]} after retracting a board, and then any subsequent retraction to another course will fail due to a uniqueness constraint so long as there is another enrollment for [p1 s1 b2]. i have implemented a business layer API for retracting different "kinds" of entities that cleanup any tuples known to be "about" them. but in my real data i have many, many different kinds of entities, and many tuples that could be about any one+ of them. so when adding a new tuple to the schema, or transacting an existing tuple that includes a new kind of entity, there is a feeling of technical debt when the developer must know which retraction API functions to update. since the schema was designed in such a way that tuples should not exist with nil values, i was hoping for a "catch all" transactor function that can clean up related tuples without making complicated decisions about which ones to look for.

joshkh 2020-09-21T17:36:55.053500Z

(another option i explored was having component references from all entities back to tuples so that they are automatically retracted, but this proves to be just as tedious on the other end when transacting new entities)

favila 2020-09-21T17:55:12.054400Z

what if instead you wrote your own retractentities function which does what you want?

favila 2020-09-21T17:56:28.054600Z

This is possible if an enrollment becomes invalid (i.e. should be completely retracted) if any of player, server, or board are not asserted

favila 2020-09-21T17:56:38.054800Z

is that true?

favila 2020-09-21T17:57:57.055Z

I think that’s what you mean by this:

entities with tuple attributes that are unique are no longer "valid" when any one of their tuple reference values is retracted

favila 2020-09-21T18:00:55.055200Z

then you could query for [?referring ?attr ?e] (vaet index), see if the attr is one of your special ones, and if so, emit [:db/retractEntity ?referring]

joshkh 2020-09-21T18:01:43.055400Z

> your own retractentities function as in an API layer function or a transactor level function?

favila 2020-09-21T18:01:55.055600Z

You could look at tupletype membership, but I think it’s going to be less surprising to have either a hardcoded list or your own annotation on the attribute, e.g. :required-for-entity-validity?

favila 2020-09-21T18:02:14.055800Z

transactor level function would be safest

joshkh 2020-09-21T18:05:06.056Z

agreed, and that's where i'm at. but if i'm understanding you correctly, the problem still stands of knowing which tuple attributes reference which entities if i want to shorten the list of possible matches. in my case, nearly any tuple can reference nearly any entity.

favila 2020-09-21T18:06:33.056200Z

you don’t need to know about the tuples, but the attributes that compose the tuple

favila 2020-09-21T18:07:16.056400Z

since you know you are retracting, if you retract an attribute which is a member of a tuple, you know the tuple is going to get a null in it, so you can retract the entire referring entity

joshkh 2020-09-21T18:07:48.056600Z

oh hey, that just might work...

joshkh 2020-09-21T18:07:54.056800Z

thank you for clarifying 🙂

favila 2020-09-21T18:10:47.057Z

I still think it’s probably safer to annotate/enumerate attributes which you want this cascading behavior on

joshkh 2020-09-21T18:11:49.057200Z

yes, i'm with you on that. ideally it's something i can update via annotations on the schema rather than in the codebase.

favila 2020-09-21T18:11:56.057400Z

correct

favila 2020-09-21T18:12:09.057600Z

This is just wanting one piece of isComponent’s behavior

joshkh 2020-09-21T18:12:31.057800Z

i've always thought of it as a "reverse component reference" :man-shrugging:

joshkh 2020-09-21T18:13:07.058Z

which isn't 100% accurate, but for some reason it's stuck in my head

unbalanced 2020-09-21T23:57:58.058200Z

I'm actually surprised the 2nd and 3rd examples work. I would've expected [(clojure.core/= :b ?o-type) ?is-b]] to fail on unification when false