@viesti > data+compute locality is a good thing I think, avoids shuffling of data. don’t forget the fineprint: • when data is easily shardable (no broad join required) • when you have a good enough idea of data distribution to have balanced shards.
yeah
this paper spawned discussion elsewhere too, I skimmed (I have to learn to read papers too, instead of skimming :)) the Pywren paper and was thinking that in what they did, data placement in S3 was a good git for Lambda autoscaling
there are different problems to which different solutions work well for
I don’t know if it is good to generally label one approach “the best”
simple/definite answers to complex problems are always dubious
Could this be done with a more data-centric approach: https://aws.amazon.com/blogs/aws/boost-your-infrastruc