A question regarding the importance of data immutability in node.js server-side. Considering a typical scenario where an nodejs app reads some data from some data sources, apply business logic and return data in JSON.
What kind of issues could arise if we don’t use immutable data?
I mean: the state is external to the app. So why data immutability is important in that specific case?
immutability leads to better local reasoning. if you have an immutable thing, you don't have to fear that some other part of the code will modify it from under you
I know but it might sound theoretic for nodejs devs. I forgot to mention that the context of this question is a talk about the value of data immutability that I am going to give at a node.js meetup next week
It is easier (at least for me) to articulate the value of immutability when the app has an inner state
Maybe it's good to read the READMEs from several immutable JS libraries like immutable.js
E.g. in the frontend
because they exist for a reason
ah you mean nodeJS as in backend JS apps, yeah, not sure if immutable JS libs are used a lot there, interesting question
I am gonna re-read Immutable.js README. By the way did you know that the Immutable.js was kind of dead?
no, again? which lib has arisen now
immer
also lodash fp
and ramda
I don't follow all that BS and hype anymore. Just use CLJS :P
A book written by Fogus about functional programming in JS in 2013 is now probably stale. While the book he wrote earlier in 2010 still runs with Clojure 1.11
But maybe you could read that book for inspiration as well
Good idea!
Something else: I understood recently that all the libs that provide immutable data manipulation on top of native JS objects are efficient only with records but not with associative arrays
@ericnormand do you have a take on the relevance of immutable data in nodejs backend side?
It's interesting why TypeScript became popular while it's not immutable by default, whereas Clojure has the opposite: dynamic typing + immutability
probably marketing though, TypeScript is pushed by M$FT
I think the statelessness of HTTP helps a lot
each request is handled largely independently, using its own state
it reduces the amount of sharing
so even if you use mutable data, it’s very local
until, of course, your code grows and it still gets out of hand
that’s all very general, though
I don’t have experience using Node.js
related: i think that’s one of the hidden values of microservices
the services don’t share memory
they make copies of anything that needs to be shared (by serializing and deserializing)
Yeah. My question is not specific to nodejs
Feel free to address the broader question
the fact that each HTTP request is handled with very little sharing really helps
so, for instance, you copy data out of the DB, you mess with it all you want, then send a copy back to the client
no other request had access to that copy
another thing that helps is that most apps are partitioned by user
what do you mean?
race conditions are rare because, even with millions of users, they’re all reading and writing to different rows in the database
it’s a very rare case where you’ve got two windows open and quickly clicking buttons in both
that has more to do with DB concurrency than in-memory data structures
but from what I have seen, most web apps do not have concurrent access done right
in practice, though, because I’m modifying my documents and you’re modifying your documents, there isn’t much concurrency anyway
In a Google docs like scenario, there is concurrency
but if I logged in on a few phones and started messing with it, I’d probably find some bugs
yes, and in those cases, they are well-built
the whole google doc is a concurrent data structure
it’s not crud
What other concurrent use cases do we have out there
?
less large-scale that Google docs
chat rooms?
games?
let’s focus on chat rooms
One could implement a chat room with websocets. So it’s a good use case for nodejs, I guess
yes
you could have the chat log in memory
you could or you should?
could
i’m just trying to avoid using a DB in this scenario
most apps push their concurrency into the DB
I know
That’s why I am looking for a use case where it makes lots of sense of have the state in mem
sessions are another one, but they are partitioned by user as well
what kind of concurrency issues would we have if we don’t use immutable data in a chat app?
@viebel I can give you a nightmare example of lack of immutability in a Node.js app. We use mongo and mongo has queries represented as data. So you construct a query based on various request parameters and send it off to mongo to execute. There's a bunch of middleware that goes between the original query and the execution, each of which will modify the query.
The problem with mutable data here is that during development, you can't know what's going on. Once you pass the original query off for execution, you can't reuse it to do a second execution.
We have had dozens of subtle bugs where people assumed the query was the original one and tried to extract parameters from it, reuse it, log it -- but instead they were dealing with a mutated one.
In fact, in a relatively big codebase, once you pass in that query to any function, all bets are off. Even if the function says that it will give you a new query back, there's no way to know unless you go in and review every step of the way.
Which, in a nutshell, is a manifestation of the local reasoning that @borkdude mentioned.
(add on top of all this the async nature of JS, and it can be a nightmare to figure who's mutating what)
In the end, to debug such bugs I had to add console.log
every step of the way to capture the values of the query in an immutable place (the stdout).
Very interesting @orestis
Could you elaborate a bit about the bunch of middlewares that modify the query?
@viebel Imagine if the maps that go through ring middleware were mutable. That would be a nightmare
Say that you get a query that says "give me all the posts". So you have a mongo query that looks naively like {}
-> matches all the documents. But then the business logic kicks in and says, "all the posts for this users means all the posts in the teams they are members of". So it adds {channel_id: $in: [x, y, z]}
. Then another middleware adds "don't show drafts unless it's your own posts" so it adds {$or: [{status: "published", author_id: foo}]}
... and so on.
The way I write it, it sounds manageable, but in reality it's not 🙂
I see what you mean.
E.g. in this legacy codebase, we have a function that is named querySchema.validate
. You would expect that this will, well, validate the query. But it actually mutates it.
It's nothing that a little discipline can't fix (that's what Uncle Bob would say). But diving into a new codebase without any systemic guarantees... good luck.
When data is immutable, you can store in a variable each step of the process and inspect it or replay it as you wish. Libraries like https://github.com/vvvvalvalval/scope-capture cannot work in a mutable environment.
@orestis I’d like to claim that there two approaches to embrace immutability in JavaScript: 1. Using a lib like Immutable.js => immutability at the level of the data structures 2. Using a lib like Lodash FP, Ramda or Immer => immutability at the level of the way we manipulate data
The problem with approach #1 is that it requires non-native objects
I'm not sure if the typesystem could help you here. Does Typescript have a concept of immutable function arguments?
@orestis in what sense is the guarantee not that strong?
You can find numerous ways to work around it (based on that article)
The proble with approach #2 is that it is hard to enforce + it doesn’t scale well
Do you think that approach #2 would have solved the problems you encoutered in your nodejs app?
I don’t think so. @borkdude?
No, not unless the original developers who put the system together understood the problems of mutabilty 😄
Why?
I actually don't know TypeScript
I mean if you forbid object filed assignment
What could go wrong?
How would you forbid it?
(I'm not familiar so much with those libraries either)
Either by convention or with Object.freeze (deep)
Right, so back to discipline 🙂
Yeah. But it’s much easier to catch during a PR
I imagine one could write a linter that checks that (js-kondo @borkdude?)
My opinion based on what I've seen in this codebase is that if things are possible, people will do it.
So any time you have a plain JS object, you cannot know that someone will not mutate it.
Perhaps the current team is disciplined and consistent. What about a 3rd-party library?
Unless you call object.freeze
Well they will try to modify it and then it will throw at runtime, right? Marginally better but not ideal.
Using immutable.js actually is a proper API contract. The moment you leave immutable.js land (e.g. to use said 3rd-party library) you know you are entering the danger zone.
Which is the point of having this immutability baked in the language. There's no danger zone 🙂
I need to run, thanks for giving me a soap box to vent my frustrations at this legacy codebase. Fortunately the transition to Clojure is going well 😄
Before you run, save this link for later https://github.com/tc39/proposal-record-tuple
One day JavaScript will have immutability at the level of the language
Thank you @orestis for sharing your insights
Defaults in a language matter. As someone mentioned above, you can be disciplined on a single project, if everyone agrees, to avoid mutability, but as team members change, the project grows, etc. very difficult to enforce over time.
I have worked on single-threaded large C code bases with fairly extensive data structures kept in memory between client requests, and it becomes fear-inducing to look at some code that is 5 levels deep in the function call tree, with 10 more levels beneath you, to have any kind of assurance which functions modify what, even in single-threaded code. Reasoning about correctness is very non-local -- you pretty much need to understand the whole code base in order to understand whether a change is correct (or whether the current code is correct)
Could you get into more details about why reasoning about a local function correctness is non-local when data is mutable?
var valid = validate(list);
foo(list);
bar(list);
var valid2 = validate(list);
// valid could be true and valid2 could be false
// it entirely depends on the implementation of foo and bar
By contrast, if list were immutable, you know that valid2
is true iff valid
is true.
If you want to change list
, it also has to be done explicitly, making reasoning about what the code is doing easier.
There are different kinds of locality
multithreading on the jvm means that "changes from under you" produces undefined behavior
but in a single threaded context there are still logical boundaries
const execute_lazy = (query) => {
return () => {
return execute(query);
};
}
const query_a = { select: '*', from: 'table' }
query_a['where'] = 'field > 0 && field < 100';
const results_a = execute_lazy(query);
console.log(results_a());
query_a['where'] = 'field > 100';
const results_b = execute_lazy(query);
console.log(results_b());
so this would work and produce no bugs
const query_a = { select: '*', from: 'table' }
query_a['where'] = 'field > 0 && field < 100';
const results_a = execute_lazy(query);
query_a['where'] = 'field > 100';
const results_b = execute_lazy(query);
console.log(results_a());
console.log(results_b());
but this would not
anything that "stores" what it is given to refer to later is a potential boundary
either closures or objects or wtvr
and in node you still have concurrent processes, so they can share data
so say you have some piece of mutable data you put in a middleware shared between route handlers
state updates to that can cross the boundary into other "processes" when you await some request or whatever
Sounds very interesting @emccue. Unfortunately, I gotta run 😞. Keep writing and I’ll read and respond later
yes, it has readonly
Cool!
"Could you get into more details about why reasoning about a local function correctness is non-local when data is mutable?" Imagine you have some graph data structures with nodes and edges in memory, mutable, and a single-threaded program handling requests and updating that graph data structure. It has a particular schema, and it is big. The code for modifying that graph in memory is not in a single function. You have a call tree of C functions with a single top level entry point, but the full call tree is a decent size tree with up to 10 levels of calls deep. Some of those functions only read things in the graph, but a large fraction of those functions can insert nodes, add edges, or mutate existing nodes or edges. If you have a picture on the board or in your head of exactly which of those hundred or so functions modify exactly what, and under what conditions, you can reason about how a certain change to the code will behave. If you do not have that knowledge in your head, then you are not sure whether a change to one of those functions will violate assumptions in 1, 2, or 7 other functions in those hundred.
I mean, with a large enough code base and immutable data, you could potentially also create something where local reasoning breaks down, but it breaks down in different ways. Mutation increases the number of ways you can be wrong.
Immutability at the very least lets you answer this question very quickly and easily: "If I call function foo and pass it these parameters, will it mutate those parameters, or anything they reference?" because the answer is always "no".
In a program where mutation is common and expected, that question can be extremely difficult to answer correctly.
Looks like it’s not that strong of a guarantee https://basarat.gitbook.io/typescript/type-system/readonly
And of course as all Typescript, the guarantees go away at runtime. So again the 3rd party library story isn’t covered.
Even just a simple example like this breaks my brain
x == a; // true
// a changes here in some multi-thread environment
y == a; // true
x == y; // false...
err...There's a nice section in, I think, Joy of CLojure, where the author talks about equality and how you can't really have equality in an environment where you have concurrency and mutation.
At best all your equality statements need qualifiers, i.e. x and y were equal, where equal means they both have the same value within a specific period of time, but how do you define that period of time? What if your values have some sort of STM, do you have to qualify equality with something like x and y were equal within a certain time period, and we don't care if x or y were in the process of a transaction that would result in a value where they weren't equal?
I read an article on some proposed new programming language where they discussed ideas for equality, and proposed that equals on mutable values should be explicitly called something different that could be read "equals now"
Yes, it was this paper: https://www.researchgate.net/publication/310823923_The_left_hand_of_equals. They didn't advocate going all immutable in the end for their programming language, but I like the idea of calling something "equals now"
Baker's EGAL operation they call, to contrast it, "equals always", which is what equals on immutable values is.
Yes. I think that makes sense jf any two things are equal at any point in time then they are equal at all points in time.
Mutability: everybody has a plan until it punches them in the face.
;Start node CLJS REPL
;clj -Sdeps '{:deps {org.clojure/clojurescript {:mvn/version "RELEASE"}}}' -M -m cljs.repl.node
(defn mutable-danger-101 []
(let [obj #js{:x 42}]
(js/setTimeout
(fn []
(set! (.-x obj) :boom))
(rand 1000))
(js/setTimeout
(fn []
(println "What am I?" (.-x obj)))
(rand 1000))))
(dotimes [i 100]
(mutable-danger-101))
This will randomly print either What am I? :boom … or … What am I? 42
Sorry to come off the high ropes like that, but to me this is the truth: If a person doesn’t understand the problem of the example above, they haven’t tried doing quality UI or backend development. I can only point them to the number of Rich Hickey talks out there; He explains the problems of mutability very well. I think that in order to really see the problem, you must have experienced the pain, and messed up a codebase 1+ time (while you really cared, and wanted to do good work).