the java sdk or cognitect aws-api work for s3 access
just give the compute group ec2 role permissions to the bucket
Are there docs anywhere about the expected CPU use of queries vs transactions? Our current setup doesn't yet have query groups, and we're performing a lot more writes (i.e. transacts) than we are queries. I'm seeing CPU hitting 98+% on the transactors, and then everything falls over. I'm curious if creating a query group to offload the queries could/would drop CPU on the transactors a lot more than the ratio of queries/transacts would suggest, because maybe queries are a lot more CPU intensive?
Also, is there documentation anywhere on all the standard graphs on the Datomic Cloud dashboard? Like, TxBytes
. Is that a per second average or an aggregate of all the data transmitted since the last datapoint? I'm assuming the latter, as changing the dashboard period, and therefore the interval between datapoints, alters the value significantly.
A wish question (I wish-and-hope-this-exists): Does anyone have something that allows me to edit a Datomic cloud database as a spreadsheet? Or as a simple CRUD app? We have a bunch of static information that we display to the internal users on Metabase - and they want to change the values they see.
https://github.com/hyperfiddle/hyperfiddle This may be what you're looking for!
In Datomic, what is the best-practice way to model this relationship: object A
contains references (i.e. many instances) of object B
and we want a field in object B
to be unique within the context of object A
. From the documentation, it does not seem like :db/unique
(either with :db.unique/identity
or :db.unique/value
), by itself, is appropriate. Wondering how to correctly model this constraint within the Datomic Schema.
@mafcocinco Look into using :db.unique/identity
tuples for this, either heterogeneous or composite.
Also, depending on how many "many instances" is, maybe B
should point to A
?
True. It doesnt matter which direction the index points and that would probably be easier.
How many is "many instances"? The answer to which direction it should go depends on the required selectivity of the access patterns. Again, all predicated on "many instance" π
Is there a way for me to know which Datomic Cloud query group node a client api request went to?
xy problem
groans "what are you actually trying to solve?"
π
Actually lol'ed π Knew this was coming.
I'm sensing a new precursor to "Everybody drink"
We are receiving ~20 datomic client timeouts all on the exact same d/pull call within a 3 minute window, which is surprising because that call doesn't actually pull that much data. I was curious if the node those client api requests went to was overwhelmed.
Check your dashboard, do you have any throttle events?
Not at that time. The query is set to a 15s timeout and it's hitting that on every one of those calls.
I thought it was a pull?
It's a query with a pull. e.g.,
(d/q {:query '[:find (pull ?p [* {::props-v1/filter-set [*]}])
:where
[_ :customer/prop-group1s ?p]]
:args [db]
:timeout 15000})
Were these against the same database?
All but 2.
does that same exact pull call happen at other times of the day?
Yes
That query will always return a seq of 3 maps with < 20 total datoms.
how long does it ordinarily take outside the problem window?
< 200ms
cool cool...
avg maybe 50ms.
can you launch that pull concurrently (futures / threads) and reproduce the issue?
Try ^^ against a different QG of size 1 and look at its dashboard.
one of the lovable perks of infinite read scaling
that will at least tell you if the synchronicity is significant
Maybe your on-demand DDB table wasnβt provisioned for that demand?
From looking at the query group dashboard, I can see that the group was overwhelmed at the time. min cpu of 99 & max of 100. There were only 2 nodes in the group. I also observe that at lease one other query resulted in 50.4k count. The overwhelmed system simply manifests itself in those frequency, but small queries. Thinking the fix is to scale the system up at the time of the 50.4k query. Separately, does the Query Result Counts graph show the number of datoms a query returns or something else?
That graph show the number of results not datoms. A result can be many datoms
Instead of scaling the qg up, can you make a separate QG for that other query so they don't affect each other?
So if that query is pull
'ing in the :find, it could actually be some scalar * the reported number?
Assuming all the results are uniform, yes, that many datoms would be returned. Datoms isn't really the right measurement here though.
Yes, that is an option. I'd like a bit more data on which queries are causing that huge result set. I have a couple ideas but need more data to know how to split. Why would you tend to prefer splitting over scaling?
Caching?
Yep, but beyond that, these sound like different kinds of workloads.
"that many" is scalar * reported number, assuming uniform?
10 or less.
If I know each pull returns exactly 3 datoms, then the returned datoms is:
reported number * 3 = "that many datoms"
as a guess. A
is an environment for our testing platform and B
is the meta data for each service that will be tested in that environment. Our platform currently consists of ~8 services and I donβt see that number going up significantly.
Yeah, they kind of are.
Then performance doesn't matter here and you should do whatever is most convenient for you. That entire dataset will fit in memory, yay!
Another option I've been considering is "filling out" my query group with spot instances. It's likely that would solve this problem as well, at a fraction of the cost.
Is one of them a scheduled batch job? You can always spin the QG up just for that job π
"this problem" <- you know what I'm going to ask.
https://clojurians.slack.com/archives/C03RZMDSH/p1620410343010200
Getting timeouts due to hitting peak capacity.
e.g., cpu spikes to near 100, some small number of queries timeout, then the event is over.
> Getting timeouts due to hitting peak capacity
^^ That is a symptom, and we still don't know why it occurred do we?
FWIW, a shorter timeout on your pulls with retry wrapped around it would also alleviate the above symptom because the request would (eventually, but how unlucky can you be?) be routed to a different node.
So I can reproduce the query result by calling count
on the result of d/q?
Fair. My hypothesis is those 50.4k queries. I'm betting there are multiple of them.
& there's only 2 nodes in the group at the event time. So if both nodes are processing 1+ 50.4k queries, perhaps pretty unlucky.
Yep
So there are only 2 nodes in the QG and there are 2 queries returning 50.4k results being issued at the same time?
I don't know for certain since I don't have that data instrumented right now but, yes it is likely. There's up to 5 queries that could all run in the same 10s window that are of that size.
d/pull is not included then?
Datomic Cloud currently uses the older launch configuration setup in creating ASGs so a mixed group of Spot & On-Demand is not possible π’ I created a feature request here: https://ask.datomic.com/index.php/607/use-launch-template-instead-of-launch-configuration.