Would anyone like to jump on this discussion thread I created on the subreddit? (I'd prefer to keep it there so it doesn't disappear behind Slack's paywall) https://www.reddit.com/r/Clojure/comments/lpv8ok/spec_vs_malli/
The sentiment from the community so far has been leaning towards Malli. I'd love to get some alternative viewpoints in there.
@alex.joseph.whitt All I'll say is that I try hard to use "official" Cognitect stuff where it is available and we've been very happy with Spec for several years at work, although we have recently started using exoscale/coax in addition to Spec so that we can tease apart/replace our "coercing web specs" and use a more standard approach (i.e., separate coercion of strings, such as form input data, to stuff like long, bool, date... from the actual specs themselves).
I think your summary and the comments there are fair
I have not looked at Malli in any depth but we did use Schema for a while (and abandoned it -- actually twice -- because it felt non-idiomatic with its syntax and it was only as good as your testing strategy, without generative testing: we were happy with our switch to Spec in comparison to Schema). We also tried Typed Clojure (again, twice) and abandoned that for different reasons. Spec hits a sweet spot for us -- and we use it in a lot of different ways per https://corfield.org/blog/2019/09/13/using-spec/
if I could wave a magic wand and have a final design and impl for spec 2 I would, but it's tackling some hard problems (beyond what we tried to tackle in spec 1). I know it feels stalled but work really does continue on it and I have hope that we will pop the stack of other work in progress back to it "soon".
I'm glad y'all are taking your time. It's worth it.
Your comment about Malli embracing coercion would make me want to avoid it -- we went down that path with our own library and it was a mistake, hence our shift to Spec + coax now.
@seancorfield Can you explain why it was a mistake?
Complecting coercions with validations is just messy and it can hide errors.
Our specs are simpler now, and don't need custom generators as often.
The coercions are clear and separate. We can test coercing code and validating code separately.
I was one of those arguing in favor of having specs that coerced values for a long time. I was wrong about that.
I would love to add spec to babashka btw. But I don't feel certain about the alpha suffix and where it's going next and how long it's going to be. It is a recurring question from users. Since it's part of the current clojure people sometimes expect it to be just there. clojure.test.check is already in there, as a preparation step
I would also love to hear some insights from @ikitommi about the coercion philosophy in malli. Maybe it's one of those easy vs simplicity things, where a lot of people really just want the easy?
@seancorfield Do you use clojure.spec to validate incoming web requests, in an open world way? @dominicm recently pointed out that this can be a security issue. This problem will be solved with spec2 which supports closed.
I disagree that it's a problem with spec. If you have a context where you must only accept a certain subset of keys, that's what select-keys
is for.
We use spec as the "source of truth" for various things and we have situations where we derive the set of keys from the spec and then either explicitly restrict the data we process to that subset or we simply trim the keys present down to that subset, depending on whether we want to flag unwanted keys or not. But that trimming/checking is independent of how we validate incoming web requests.
If it's not a problem, then why is spec2 introducing closed specs again? :thinking_face:
Because people whined about it not being in Spec 1 🙂
You could certainly do closed spec checking with Spec 1 -- you just have to do it manually.
I think what @dominicm was hinting at was that when you allow any spec to be validated in a web request, this may trigger functions you did not expect to be executed, or something, which may lead to DoS, or whatever
That's a lot of hand-waving 🙂
I was asking questions, I did not take a position in this. Just wondering if you had considered it
I think it's a strawman argument to "blame" spec for something like that, frankly.
ok
If he has a specific, realistic scenario where just using Spec 1 to validate an incoming web request would cause a DoS, I'd be very interested to see that.
This is what Dominic said in #yada when asked why Yada does not have integration with Spec: "Because it gives attackers access to all your spec keys. Some of which might be very slow (and not run in production in normal circumstances). This opens you up to DOS attacks, the guidance from Alex on this was to validate data before passing it to spec. On top of that, you usually want to restrict keys in the web context as you're going to pass them into the database." -- I've covered the latter part (above) and I would expect in real-world code what you put in the database isn't necessarily going to be anything like the set of keys that you get passed in a web request because the persistence model and the API model aren't likely to be a 1:1 match anyway.
I've had cases where the data I wanted to spec had to enforce a closed nature, namely, working with an HTTP API that throw an error if you gave it data it wasn't expecting. In that case we want a spec that validates that we're not including anything extra. The same would apply for a spec describing incoming data on a system that likewise wants to error out if the user sends unexpected fields (e.g., so users get warnings if they typo a parameter name or send data they think is doing something when it is actually being ignored)
I guess you could ask the question: why should web APIs validate differently than functions?
I don't find the first part very convincing: if you have "very slow" specs that don't run as part of production code, why have them in your production code at all? Why not have them in test code (and if they're "very slow" they're not even going to be part of your unit tests).
As for Robert's case: I covered that above -- if you do want to fail validation on additional/unknown keys, that is possible with Spec 1. It's just not directly built in. I don't see Spec 2's "closed checking" as any sort of admission that Spec 1 was wrong -- I see it much more as a convenience that lets you write a little less code in certain situations. I think most people made a big ol' mountain out of wanting closed specs when it's really just a molehill.
(I think this is absolutely one of those cases where people pushed for "easy" rather than "simple")
Well, sometimes UX matters
@seancorfield just to clarify: malli separates coercion from validation, for the reasons you mentioned. You first "coerce" a value, then validate and explain (and humanize the explanation) if needed.
Good to know -- that wasn't clear to me from @alex.joseph.whitt’s comment.
Tried following the link to the yada channel to see the conversion but perhaps it happened awhile ago. Curious what validation was suggested before passing to spec.
I didn't dig far enough into the Zulip archives to see what Alex had actually said about that -- I'm pretty sure Alex didn't literally say folks should "validate data before passing it to spec" but he has talked about doing coercion on input data before passing it to spec (a separate thing).
Slack has a 10,000 message limit on the free plan but most channels are mirrored to Zulip http://clojurians.zulipchat.com which has an unlimited archive and search facility (for the open source plan we're on there).
Got it thanks 👍
Well, I don't know if we're any closer to a decision... so many good points brought up on both sides. I'd love to know if the Cognitect team ever considered implementing spec with a data-driven architecture like Malli's, and if so, what factors led to deciding against it. For some context, I've been an avid spec user for years, and only recently became aware of Malli. As we're starting a new project, I guess I'm experiencing some FOMO regarding some of Malli's features, but I can't escape thinking that a lot of Spec's design decisions were really on-point and forward-thinking.
spec is generally very opinionated, but once you "get" the design choices it's hard to be in disagreement with it. It fits quite well with datomic, "Maximal Graph" & co approach to dealing with data, which imho is spot on (I am not a datomic or pathom user but I agree with the general ideas). Right now it's also quite bare bones but there are a few libraries that fill the important gaps (coercion, error messages, etc). Then for me at least spec2 would fix most of the important problems I have with v1 (mostly schema+select, or whatever these will be called), but even without spec1 is very usable. Most of the problems stated about spec 1 imho are very often overblown or just misunderstanding on how to use spec or its goals. But I get the frustration of malli authors who wanted spec2 now (I do too, but I rather have something carefully designed than a rushed draft) and the amount of work they poor into malli earns respect. Personally I do not agree with some of their design choices and most importantly the fact this will cause yet more fragmentation in that space. I would rather have had all that fire power put in spec libraries. I also had to deal with code bases that invested in spec-tools and broke some teeth in the process, so I prefer to be very conservative when I choose a library to fill that spot now.
about closed specs & data for me these are quite minor issues (if at all), I had to generate specs from another dsl on work I did recently and it's not a big deal at all. For more advanced composition I can't remember a case where i was blocked by the current api, in extreme cases you can rely on eval/macros to get to what you want but it's extremely rare in my experience at least. That said if spec2 improves that, great. For closed specs it's the same, writing a strict-keys version of s/keys is quite easy a naive version with poor error reporting is a couple of lines and a more involved one with nice error messages more work but it's under 50 loc
having multiple choices might be good, that's could be a sign we have a "healthy" community
About the data driven nature of malli vs spec: I think it's fair to say that malli schemas are a better fit if you want to have a programmatic way of creating/changing schemas (in spec you end up with macros for this usually) and when you want to have (de)serialization of your schemas. That's not to say that you can't do this with spec, but the UX of malli is probably better for this kind of use.