clojure-dev

Issues: https://clojure.atlassian.net/browse/CLJ | Guide: https://insideclojure.org/2015/05/01/contributing-clojure/
john 2019-07-03T03:47:54.327800Z

What I want from a distributed application programming interface: 1. swap!/deref 2. some useful subset of the clojure language 3. nothing else

2019-07-03T05:03:35.329100Z

Do you want this capability in a distributed system, where the failure of one or a few machines does not inhibit the progress of the others? Or do you want it in a system of N machines where if 1 fails, the computation fails and/or locks up, and maybe needs restarting?

john 2019-07-03T12:42:50.335200Z

A little bit of both. I don't think all distributed system designs should have to worry about "Byzantine fault tolerance" and net splits. Unless you want to distribute work over a network of potentially untrusted peers or something.

john 2019-07-03T12:51:24.338800Z

Zookeeper is probably overkill for a lot of "distributed" use cases I think

john 2019-07-03T12:58:03.341300Z

And these days, if you fully control two networked computers and one isn't playing nice with the other, it's probably the application programmers fault and they should fix it.

john 2019-07-03T13:03:23.346Z

But anyway, machine redundancy stuff can be somewhat orthogonal to the distributed application programming interface on top that I'm thinking of. It'd probably make sense to have some rudimentary supervisor service running, like with Erlang. But if application logic breaks the system, application logic shouldn't do that.

john 2019-07-03T13:08:10.348200Z

The newer version of Tau has a supervisor that coordinates things and a watchdog service for health monitoring. Not much implemented there yet though.

2019-07-03T13:30:49.348800Z

failing machines are pretty far from worrying about untrusted adversaries.

john 2019-07-03T13:35:06.352600Z

Well, if a thread fails, that rarely locks up a whole program. And if it does, the programmer needs to build redundancy into that thread logic.

john 2019-07-03T13:35:37.353400Z

Using network compute resources should be as easy as using threads

john 2019-07-03T13:43:49.355800Z

And using clojure, I don't even really want to have to deal with those networked threads. I'll just lean on higher level abstractions to manage those networked threads

john 2019-07-03T13:46:43.358Z

Networked futures, agents, atoms, etc

2019-07-03T13:57:14.358100Z

Would each atom be 'owned' by one machine? If so, I don't see how you can write code using atoms where the whole computation doesn't just lock up when the owning machine dies. If the atom's "owning machine" can move, then you've got a fun implementation.

john 2019-07-03T14:52:47.358300Z

Depends. In some situations I've cached the atom on all subscribed nodes, and have a particular "owning" node control all updates and broadcast results. With that setup, it's pretty easy to change ownership to another node if the owning node stops responding to a watchdog.

john 2019-07-03T14:57:37.358500Z

In terms of "moving machines", I haven't thought about that much, but we're using "js isolates" here, so there may be a story with regard to moving js isolates between machines already.

2019-07-03T15:02:52.358700Z

It isn't difficult to have systems that usually work, but break under unusual but possible circumstances, e.g. the watchdog timer fires, and then the owner starts making progress at the same time. That's where Paxos/zookeeper/etc. shine, is avoiding incorrect behavior in those unusual corner cases.

john 2019-07-03T15:03:07.358900Z

But, if you want to spread ownership of an atom across multiple nodes, I'd just make sure ownership transfers unidirectionally across whatever shape of nodes you've built to do the work. And then have the result finalized at a last node (the UI maybe) or the one originating the request

john 2019-07-03T15:05:36.359100Z

Paxos may or may not be necessary, depending on the reliability of the system underneath. But I'd like to have a distributed thread abstraction on top where I don't have to care about the underlying coordination, just like a cpu.

john 2019-07-03T16:10:58.359300Z

Infiniband network drives usually "just work"