datomic

Ask questions on the official Q&A site at https://ask.datomic.com!
joshkh 2021-05-21T12:32:24.006Z

i'm sure it's for a very good reason and so i'm just curious: if Ion lambdas proxy requests to the compute/query groups, then what is the reason for running them in a JVM runtime rather than something with a quicker cold start?

1☝️
joshkh 2021-05-21T12:35:13.008400Z

i'm also asking because quite a few of my ion lambdas are synchronous where response time matters, and there are even some hard limits in AWS (for example Cognito has a fixed 5 second timeout on its post confirmation lambda trigger). i can use lambda concurrency to solve the problem at a price 🙂

tatut 2021-05-21T12:39:50.008500Z

have you tried http direct?

joshkh 2021-05-21T12:57:39.011700Z

i have yes, and it works really well. in this case i'm referring to (ion) lambdas that should be lambdas by design: handling Cognito triggers, glueing together Step Functions and pipelines, handlers for AppSync resources etc.

joshkh 2021-05-21T12:58:57.011900Z

unless i'm missing something and http direct can help there?

tatut 2021-05-21T13:03:46.012100Z

ok, I assumed you meant web… but nevermind 😄

tatut 2021-05-21T13:03:56.012300Z

glad to hear http direct works well, I’ve yet to try it out

cjsauer 2021-05-21T13:05:14.012800Z

I’ve been wondering about this myself. HTTP direct required the prod topology because of the NLB requirement, but would it be possible to spin up a NLB manually and use it with solo?

Cameron Kingsbury 2021-05-21T17:55:46.016100Z

So I can use

[?entity1 ?attrname1 ?attrval]
    [?entity2 ?attrname2 ?attrval]
in a :where clause to get ?entity1 and ?entity2 where there exists an ?attrval that matches --- I'm using
[(q '[:find (seq ?attrval)
        :in $ ?entity ?attrname
        :where [?entity ?attrname ?attrval]]
      db ?entity1 ?attrname1) [[?attrvals1]]]
  [(q '[:find (seq ?attrval)
        :in $ ?entity ?attrname
        :where [?entity ?attrname ?attrval]]
      db ?entity2 ?attrname2) [[?attrvals2]]]
  (not-join [?attrvals1 ?attrvals2]
               [(seq ?attrvals1) [?element ...]]
               (not [(contains? ?attrvals2 ?element)]))
to get ?entity1 and ?entity2 where all attrvals for ?entity1 exist for ?entity2. Is there a more performant way to do this?? (This feels like a directional "and" to the implicit "or" being applied to each attrval matching in the first case)

2021-05-21T18:07:29.017100Z

What should I make of finding tx-ids with no associated txInstant? (:db/txInstant (d/entity db (d/t->tx t-time))) ;; => nil

Joe Lane 2021-05-21T18:37:46.023500Z

Hey Cameron, I’m on mobile now so please forgive the brevity and any possible misunderstanding. Instead of two subqueries, try putting both of the where clauses from each subquery into one top level query and then adding a final clause of [(not= attrval1 attrval2)]. I believe there is No need for the nested subqueries, the not-join, nor the boxing and unboxing via seq and [?element ...] I’ll try and double check this when I get back at a computer.

Cameron Kingsbury 2021-05-21T18:38:25.023700Z

sweet! already got rid of the subqueries I think

Joe Lane 2021-05-21T18:39:14.024300Z

I hope I’m understanding it correctly haha

Cameron Kingsbury 2021-05-21T18:39:20.024500Z

the double not is used to produce an and essentially

Cameron Kingsbury 2021-05-21T18:39:45.024700Z

so I am not sure how it would be achieved with only the not=

Cameron Kingsbury 2021-05-21T18:46:49.024900Z

also tried

(not-join [?cat ?dog]
          [?cat :cat/paws ?cat-paw]
          (not-join [?paw ?dog]
                    [?dog :dog/paws ?dog-paw]
                    [?cat-paw :paws/smaller-than ?dog-paw]))
but it's timing out with large numbers of paws 😉

Cameron Kingsbury 2021-05-21T18:47:29.025100Z

and ?cat and ?dog need to be bound, where they didn't need to be in the subqueries...

Cameron Kingsbury 2021-05-21T18:48:31.025300Z

the above query testing that all the paws on the cat have a :paws/smaller-than relationship with any paw on the dog

2021-05-21T19:14:59.025500Z

hmm. we appear to be missing txInstants on the large majority of tx entities:

#_(let [end-t (d/basis-t db) ;; => current basis-t: 104753910
          missing-tx-instant? #(nil? (:db/txInstant (d/entity db (d/t->tx %))))]
      (count (filter missing-tx-instant? (range 0 end-t))))
  ;; => 84492058

Cameron Kingsbury 2021-05-21T19:15:01.025700Z

this seems to be 10x slower than the original

Joe Lane 2021-05-21T19:19:47.026Z

Can I see the actual, full query you're trying to run?

Joe Lane 2021-05-21T19:22:35.026200Z

Range probably isn't what you want. The contract is that T is guaranteed to be increasing, not that it always increases by exactly 1.

2021-05-21T19:23:11.026400Z

ha! damn. You know I kept wondering about that assumption of mine. Thank you!!!

1👍
2021-05-21T19:27:20.026900Z

sheesh :face_palm: hence datomic.api/next-t

Cameron Kingsbury 2021-05-21T19:29:48.027100Z

sure one sec

favila 2021-05-21T19:37:51.027300Z

Internally there is a single T counter incremented for newly-minted entity ids (when a tempid needs a new entity id). transaction temp ids are just one of the consumers of that counter

favila 2021-05-21T19:38:10.027500Z

so there is an invariant that for all entity ids in the system, none share a T

2021-05-21T19:39:56.027700Z

thanks for the inside scoop!

2021-05-21T19:43:29.028Z

We haven't upgraded to have qseq, so I'm having to break a tx-ids query into a set of smaller ranges. I was calculating these smaller ranges with simple arithmetic -- is that still acceptable, or do I need to ensure that the start-t and end-t handed to tx-ids are bonifide t-times?

favila 2021-05-21T19:49:03.028300Z

It should be fine, but why not something like (->> (d/seek-datoms :aevt :db/txInstant (d/t->tx start-t)) (map :e) (map d/tx->t) (take-while #(< % end-t) (partition-all 10000))

favila 2021-05-21T19:49:35.028500Z

i.e., just seek part of the :db/txInstant index to get the tx ids?

2021-05-21T20:05:41.028800Z

Ahh, nice. If I understand you implication, all that would be left is for me to get the start and end of each partition as follows:

2021-05-21T20:07:59.029200Z

I like this approach much better, thanks! Just to be clear though, it should be okay to fabricate a non-existent t-time near the target time when in doubt? It always appeared to work for me, but then maybe I was just being sloppy.

favila 2021-05-21T20:10:51.029400Z

It depends on what you’re doing

favila 2021-05-21T20:11:31.029600Z

d/tx-range, d/seek-datoms are ok with it, because they’re using your number to do array bisection

favila 2021-05-21T20:11:57.029800Z

d/as-of and d/since are ok with it because they’re using it for filtering.

2021-05-21T20:12:11.030Z

2021-05-21T20:12:14.030400Z

I was feeding each range to this ^

favila 2021-05-21T20:13:04.030600Z

yeah should be fine

2021-05-21T20:13:21.030800Z

Thanks for the help!!

favila 2021-05-21T20:13:43.031Z

this would be more efficient without query though

favila 2021-05-21T20:13:54.031200Z

why not use d/tx-range directly?

2021-05-21T20:14:37.031400Z

Sure, I'm not opposed to it. Would it just benefit readability or something more than that?

favila 2021-05-21T20:15:28.031600Z

query needs to realize and retain the intermediate result sets. that’s why you were chunking in the first place, right?

favila 2021-05-21T20:15:33.031800Z

d/tx-range is lazy

2021-05-21T20:16:09.032Z

bingo. no qseq required

favila 2021-05-21T20:16:27.032200Z

not sure qseq would help

2021-05-21T20:16:34.032400Z

it did in my testing

2021-05-21T20:16:53.032600Z

if I pass a month range to the function i shared above I run out of memory before I can process it

2021-05-21T20:17:16.032800Z

the same didn't occur with qseq

favila 2021-05-21T20:18:15.033Z

that’s surprising because qseq doesn’t AFAIK defer any processing except pull

Joe Lane 2021-05-21T20:18:17.033200Z

qseq still needs to realize and retain the intermediate result sets like @favila is saying, it just supports lazy transformations (like pull) which can consume an enormous amount of memory when done eagerly

2021-05-21T20:19:01.033400Z

well, then I didn't test what I thought I was testing.

favila 2021-05-21T20:21:00.033600Z

Your query is the same as this:

(->> (d/tx-range log start-t end-t)
     (mapcat :data)
     (filter #(contains? attr-ids (:a %)))
     (map :e)
     (distinct))

1💯1🙏
favila 2021-05-21T20:21:49.033800Z

except this is evaluated lazily and incrementally, so memory use is bounded

2021-05-21T20:24:58.034100Z

Welp, color me doubly embarrassed then. I must have been testing with a larger range of time when I was using d/q than when I tested d/qseq

2021-05-21T20:28:19.034700Z

Thank you @favila and @lanejo01

1💯