Read/write units really are confusing. By default the UI proposes 5 units for both but for dev and testing you can happily drop them to minimum (1) to save costs.
Another confusing thing is indexes. You need those if you want to query efficiently using any other attribute than the primary key.
Best thing with DynamoDB is that it’s fully managed (like S3). If you can live with the limitations, it’s a very care free solution.
I read that DynamoDB shards tables in 10GB chunks
and then splits the read/write units over the chunks.
So if you have an 100 GB table, and 10 WRU -- what happens is that there are 10 shards and each shard gets only 1 WRU. Is that true? If so, it sounds really limiting.
That’s true. If your dataset is really huge you also need to worry about the underlying partitioning. You need to choose your hash keys so that your data gets evenly distributed between the shards.
Also if you need to access data by several keys (indexes), the indexes consume their own capacity and not the table’s capacity.. Which practically means that you need to pay extra for each index.. Or to be precise, there are two kinds of secondary indexes, ‘local’ and ‘global’. Global indexes consume their own capacity and local consume table’s capacity.
My experience is that if you have complex data access patterns you need to plan really carefully how you’re going to query the data considering all DynamoDB’s limitations. If you can’t predict how you need to access the data, then it might be simpler to go with SQL and RDS. Another common solution is to use a search engine (i.e. ElasticSearch or CloudSearch) with DynamoDB. You can invoke Lambda functions for each operation on your table (DynamoDB Streams) that can do the indexing to the search engine. CloudSearch also provides some kind of OOB solution to setup indexing automagically from DynamoDB.. But at least a while ago it could deal only with flat data structures.
Oh, one more DynamoDB limitation which I should mention, is the lack of support for server side encryption. Java SDK supports client side encryption though.
Does anyone know if there’s some util/tool/framework to write CLJS, output it to lambda-compatible JS and package&deploy to AWS using Serverless Framework or AWS SAM? Or would it be to trivial to create such setup myself.. Or does it sound like a bad idea?
https://github.com/nervous-systems/serverless-cljs-plugin this is what I was after.
@valtteri (re last question): I have been using https://github.com/portkey-cloud/portkey , which except for being clj instead of cljs, is exactly what you are asking for, and it's amazing
@valtteri: thanks for your detailed analysis of dynamodb: as weird as it sounds, I don't need SQL / relational queries, I can tolerate eventual consistency -- but it's the provisioned IO that's turning me away (afaik, Google DataStore / BigTable has no such limitation)
For truly serverless, if there was no p;rovisioned IO, and it was just "pay for bandwidth / # of read/write OPs, DynamocDB would be perfect)
Yes I watched the presentation about Portkey and it seems really cool! However I’d like to leverage Serverless Frameworks power to setup the infra as well (using CloudFormation). Also I’m a bit worried about Java’s slower startup in Lambda compared to Node. I was wondering if there’s some easy way to write lambda code using cljs instead of JavaScript and then use the Serverless Framework to manifest the infra and manage deploys, logs etc. I’m currently using Serverless Framework mostly with Node but I’d really enjoy writing the code in clj/cljs instead.
The way I intend to get around the jvm startup time is: 1. keep a 512MB machine active at all times (and just pay for it) 2. auto scaling takes care of the rest ... because if it starts spinning up at 70% capacity (or whatever the number), the new nodes should be up by the time the old gets overwhelmed (except in cases of flash traffic -- in whih case all existing tech is going to stall anyway)