Are there any discussions/blog-posts that talk about making a component fault tolerant? What happens if my database goes down, or isn’t up, and I want to defer startup until after the connection is available and working?
or halt a component and restart when the connection is available again
How would you handle dependencies in general on a component restart?
If you restart your db component, what about components that depend on it? Should they all be restarted too?
I might have my answer
I think an individual app can decide what to do in situations like these but I don't think Component itself can.
Hmm… I need to rethink
I shouldn’t be failing to start just because the real DB isn’t available.
I should gracefully report an error, and keep functioning…
For our Component-based apps, if the db is not available at startup, we'd want to fail hard.
If the db goes offline while we're running, that's a different issue but some of our processes should just die at that point.
I’m using docker and docker-compose… and the db starts slowly
I think I can solve it there, but I was trying to come up with a solution because if I grow this and break the services across multiple cloud instances, then network route drops might play a role and I’d want the “components” which face external services to tolerate interruption
So your app would need to maintain mutable state in the components to indicate whether they are "ready".
when not ready respond thus to requests (50x) and poll in the background for availability and then bring myself back online.
I’m wondering if I should have a coordinator service external that can bring stuff up and down. Like a service in docker, that sends docker commands to stop and start stuff...
Hmmm… I could add a ring handler to report status and then post to the same url to do a restart…
I have two pieces that solve this problem for me
I have a boot fn which combines building and starting a system and, if it gets errors thrown starting the database too many times, will reconfigure and start the system with the database disabled
I also have a manager wrapping my system that watches for config changes and restarts the system during a safe period
I like that. I’ve been thinking about building an application infrastructure that watches a datastore (git repo + config files) for changes, and triggers behaviors off of it. While being able to poll services (watch statistics) and respond to certain behaviors (if I have a defined response) or just report on that state through some notification (email/text/slack)
thx donald and sean
Yeah, if you have an app that can run in both "connected" and "disconnected" mode, that pretty much has to be baked into your logic from the get-go. I don’t think there’s much support a generic library can provide you with there (other than provide structure around your "state").
If my system were under any real load, I’d like to build a piece that triggers config changes on exception rate and/or timeouts
Internal/External message queues might handle some of this also in a more tolerant way
☝️:skin-tone-2:
really been thinking about that
We’re in the process of switching architectures to that approach.
doesn’t netflix have a database driver that helps with that?
in the end I want to be able to break apart a message request and compose it from the results of many different services, that’s going to require some type of queue pipelining functionality.
I have my web handlers describe the services on which they depend and respond automatically with 503s when they aren’t available
Might be able to use a core.async channel to say “go” to the next component when it’s dependencies are ready...
ah, so you have a channel that is passed in as a dependency, and the components are pushed to that channel when you want to “scale” or “replace” ?
I was thinking that if I’m dependent, my start doesn’t get called until the previous component publishes “ready” to a channel it returns… then I’m free to go ahead
IT’s an async callback like a promise done clojure’s way
I do a lot of javascript that way, where I return a deferred instead of a result, and you trigger when I comeback. My templates render that way a lot.
My component’s start function doesn’t get called until all deps have declared that they are “ready"
That’s the way component’s systems work now, except without the asynchrony
exactly
My only problem is when the database comes up I check to verify/run the database table migrations (ragtime) and my own internal data-migration functions executed. before the component says it’s ready to be used, and that requires the connection to be active, which requires blocking
I’m using wait-for-it.sh to block until the database server is running in my docker-compose config file, but that’s not as robust as doing something in my server code to be more tolerant of dropped connections.
I just tested and deployed that while we’re chatting and it seems to be fine
Sounds like you need something in your -main
function that handles that prior to creating & starting the application’s Component?
Maybe, I might have to duplicate the config code and block until the database connection is present.
Maybe I do something else on my own with channels in my start function for my component instead...
I could do what I said myself apart from system through the system map I think. Put a channel behind a key, and defer my web service from coming up until after the database connection is finished. That’s the solution for the existing architecture. Thanks for helping me think this through guys.
let us know how it works out. I’m very interested in this.
I’m having a similar challenge with components that hold tokens to third party services that expire after a certain time, and need to get refreshed.
Heh I just solved a similar problem thereby myself. My case is an internal keyring whose keys need to be rotated periodically.
Currently I have a worker go loop that starts when the component starts, stops when the component stops, and the keyring state is maintained in an atom
go loops and component lifecycle fns are a nice fit
That’s great to hear
especially since async channels are essentially queues so they fit with other models I want to use, like websockets and Message Queues.