eeek down again =(
The problem seems to be related-ish to this one or at least the various parts of vmcore-dmesg.txt
match what’s in this GitHub issue: https://github.com/opencontainers/runc/issues/1857
For now I’ve made a system upgrade. If the problem persists I will try downgrading the kernel and Docker.
@martinklepsch, you have done a really nice job of describing ops and the related adrs help a ton in understanding rationales. Is going with some sort of higher level solution a viable alternative?
I’m not sure if you’re thinking of something specific? I think financially we’re probably good and if we need to we could “raise” some funds to cover expenses etc.
I’m generally pretty happy with the setup that we have now, Nomad really makes zero downtime deployments easy and if it weren’t for this odd Kernel <> Docker issue everything would work pretty smooth 😄
If it is not a general pain point, then there’s no sense in changing. I have experience with AWS and was wondering if any of their higher level abstractions might make life easier. But that would take some study.
Because the SQLite database is filesystem backed (so far) we can’t just use Heroku-style systems if that’s what you’re thinking of as a higher level solution
We could switch to Postgres though and it shouldn’t be a very complicated thing to do
Yeah I was wondering if a service that works at container/app level rather than OS level might be easier. But would have to really dig in an understand cljdoc ops before I proposed anything.
Yeah, something like this may be an alternative to the current setup. Switching DBs would be required but probably wouldn’t be a huge deal as mentioned above.
Certainly open to talking about it
must be a bit tricky weighing financial cost of operating vs other concerns?