architecture

seancorfield 2020-11-17T00:57:34.172100Z

I always wonder, with Jan Stepien's approach (in that video) -- which I've seen advocated a few times elsewhere -- whether it scales to anything approaching a complex app? I mean, you need lots of protocols (one for each abstraction that the use cases need injected from the adapter layer) and those protocols need a method for every interaction with those abstractions. In a system that has several dozen entities, all needing some sort of CRUD operation, that's going to be a lot of boilerplate.

lukas.rychtecky 2020-11-18T11:29:44.310800Z

You are right, when the system grows you would have more protocols and etc. But that’s good because that’s the architecture. It make the architecture explicits, communicates the entities and relationships etc. It also separates the domain from the infrastructure, because the boundaries are explicit. I have seen many Clojure projects where the presentation layer was tight together with database. So the system was hard to maintain and understand (classic scenario what happens with Rails projects).

seancorfield 2020-11-18T17:25:40.311900Z

I dislike seeing protocols that have only a single implementation, which that sort of architecture leads to. Using protocols simply so you can mock components is a bad choice, IMO.

lukas.rychtecky 2020-11-18T17:34:14.312100Z

OK how would you separate domain logic and side effect implementation?

seancorfield 2020-11-18T17:45:56.312300Z

Generally by having the domain logic return a description of changes it needs made to the "system" and an orchestration layer that calls the domain logic and then calls the side effecting code. Overall tho', I am not much of a "purist" when separating some of those things out and having DB inserts/updates in the middle of a chain of business logic doesn't bother me as much as it bothers some point.

seancorfield 2020-11-18T17:47:29.312500Z

Most of the proposed separations work fine in-the-small but really don't scale in-the-large -- and I'm working with 105K lines of Clojure that spans a decade of evolution of both the functionality and the skill level of the team members that have worked on it across that time period.

seancorfield 2020-11-18T17:48:35.312700Z

We started that journey by embedding Clojure in legacy apps and, specifically, using Clojure for JDBC stuff -- so our Clojure code was mostly side-effecting library code at first 🙂

lukas.rychtecky 2020-11-19T07:40:39.319Z

I agree that you can use same approach as Re-frame with Interceptors. But at the end it adds another abstraction layer too. The separation of the domain and side effects leads to better architecture and maintenance. It’s also easier when onboarding new people to the project. Yeah I pretty sure that making this change on legacy project is a huge amount of work. A quality of my Clojure code in last 5 years looks very different, because my experience how to make things better evolve during the time.

seancorfield 2020-11-17T01:01:07.174100Z

By the time you have a system with 100 entities, all persisting to 100 tables potentially, you're going to have 100's of protocol functions and all of their implementations, and then for any mocking you do for testing, you have to reimplement all of those against whatever mock system you use...

2020-11-17T13:49:20.179200Z

The talk is called "introducing structure". So lets talk in terms of structure and think about structure and it's opposing force unstructured/flexibility. To increase one, we must necessarily decrease the other. Even if that's not the definition traditionally, we need to define it precisely if we have any hope of communicating. Highlighting one of several places where the presenter makes such a trade off:

(extend-protocol book-table/SaveBooking 
    Postgres
    (save-booking [postgres booking] 
        (execute-sql postgres
               "insert into bookings...."
               booking)))

2020-11-17T13:49:20.179400Z

How might talk about this in terms of structure or flexibility? Counter to the talks title, we have now decreased structure. Before saveBooking always threw an error "no implementation" now it might do more. The system as a whole is now more flexible. Now can we judge this choice? From what measure? Lets pretend for a moment that the presenter is the both developer and business owner. And that he is in the business of just saving bookings and business is booming. His family has done this for generations and they have a huge amount of control of the save bookings industry. Under that context, this is an excellent choice. The structure embodies the goal, it protects it, it sits on it like a dragon on a mountain of gold. So, when you see this, you naturally asked what if we had to do more the save-bookings (e.g unsave-bookings)? To which our hypothetical dragon would promptly exert his wrath upon you. The dragon knows his rule lies in focusing on his position of power and not letting others distract him from it. Under a different context however, one where our protagonist wasn't a mighty dragon but a young ignorant traveler, it would be very unclear if building a temple to save-bookings was worth the time. To be more precise, given his knowledge of the goal is unclear any time spent on imposing structure or flexibility is at risk of being a trade off that isn't worth making. So what is he to do? To move forward he must do something! So he picks a path and bodily sets out. Wisdom precludes boldness. The correct path forward is one that stares not only into the distance (to what end you only vaguely no where your going), but also takes small correct steps in its directions. Coming back down to the software domain, the idea here is reasonable in only a really really unlikely context. Given any system i have ever worked on, its a slightly too big of a step as it assumes structure around a specific part of a business domain is important. As you point out, this would lead to a protocol function for each table. Which would imply that the structure/flexibility is useless, you keep having to build more so the original one obviously didn't help... The fact that it becomes unclear when to say something increased structure vs flexibility points to an underlying issue. One that I can't articulate (as evidenced by my rantings last night). The misalignment is partially between human and machine. We are wired to get ques from humans, not machines, but as developers we talk to machines far more. If you approach the problem from what a human wants and forget what the machines want its easy to end up with extravegantly complex models that only you can understand. You become a dragon on a large mountain with no gold inside it. So what machine must care about books? Well, postgres. Only when talking to postgres then must you care about books. Who else might care about books? Well your users via the UI. So then.. curious. Now are business domain seems to be on the outer part of our circle. The ying/yang and oop/fp So then we need a word which captures both. I like to think of it in terms of composability. Will our sql queries need to compose? What is the price if we assume they do? What if we write code that implies they don't? Does this composability obscure meaning? Or can we easily extract the meaning through evaluation? Thanks for reading, hopefully you get as much out of it as i did writing it. 🙂

2020-11-17T02:31:49.177500Z

I'm only a couple minutes in, but the issue I see is conflating how humans distinguish versus how a machine distinguishes. The word book, is not a book. To a machine its just a string. if you have N categories which are incidentally functionality treated differently within your system then everything you do in that system is made incidentally more complex but a factor of N. I think over modeling on the business domain is a real danger. It's very easy to latch on to an easy problem, how are "books" different then "students" and did m avoid talking about the mountain of inhuman issues that really slow down progress.

2020-11-17T02:38:03.177800Z

But it's so contextually it defies offering generic advice. Which plays into the fear that often motivates a lot of discussion around protective coding. People want to know what wont get them into trouble and they substitute solutions to that in the small with the larger question of how to progress.

seancorfield 2020-11-17T03:01:00.178Z

I can't tell whether you're agreeing with me or disagreeing @drewverlee?

2020-11-17T03:04:29.178200Z

Agreeing. I'm falling asleep over here so I'm probably not explaining myself very well.

seancorfield 2020-11-17T03:04:42.178400Z

OK, cool. Was a bit hard to follow.

simongray 2020-11-17T13:52:40.180Z

I have been tasked with making a facsimile viewer for the web. The viewer itself is a reagent single-page application that takes a list of facsimile (images from scanned source materials) and transcriptions of these facsimile in an XML file. The frontend SPA then converts the XML into hiccup and puts it inside a reagent component together with the facsimile to navigate between different pages. What would be optimal API/database combination for an API that should basically just be a datastore for several hundred GBs worth of images (scanned letters) and XML files (corresponding transcriptions of the scanned letters). They need to be searchable based on the contents of the XML files. I am currently unsure if it even makes sense to put the XML documents themselves in a database or if it’s a better idea to extract some metadata from them, put this in a database, and leave the documents on disk. What do you think? In case it makes any difference: I also need to make it possible for users to associate comments with specific elements in the XML documents and put these in some sort of queue until approved by a curator. When retrieving the XML documents I will need to retrieve the list of approved comments too. I was thinking the database should support this. As for the API, I am putting it inside a Pedestal web service which also serves my single-page application made with reagent. The SPA itself is contained in the index.html page and bundled JS/CSS files, but what about the API endpoint(s)? Should I go with something RESTful or does it make sense to go with EQL, GraphQL, some other solution? In the past I’ve made a transit-based API endpoint with my own custom protocol, but I wouldn’t mind standardising on something. So what kind of API/database combo makes sense for me? Sorry for the wall of text. I am just looking for suggestions.

simongray 2020-11-17T13:59:06.180200Z

TL;DR - I Need to make an API exposing a datastore that * contains thousands of images and XML files * is searchable by relevant metadata found in the XML files * supports associating comments with specific elements in the XML files (note: not XML comments, just comments like you would find on a blog) What kind of API should I make and what database should I use?

2020-11-17T14:18:34.184200Z

@simongray > What would be optimal API/database combination • a simple one layered key value store (s3, google bucket) for the images. > They need to be searchable based on the contents of the XML files. Define searchable and you have your answer for what you need for the second database

2020-11-17T14:19:47.184900Z

Do you mean, you want a word match? does searchable mean the system understands what you meant similar to how a human would?

2020-11-17T14:21:11.186100Z

If the XML has fields which are searchable then you should put those in a database with a rich query language (postgres / datomic)

2020-11-17T14:22:39.187800Z

From there you can, depending on your read write requirements either put the transcirptions in its own database (speed for space) or just do the search in memory (space for speed).

2020-11-17T14:26:27.190100Z

If your very unsure where the whole thing is going then don't use any database to start, just write a really obvious program and save the files on a filesystem? I take no personal responsibility for how that turns out 😆

😂 1
simongray 2020-11-17T14:28:11.191900Z

@drewverlee searchable just means that a list of documents need to be retrieved based on some filters. The XML documents contain a header element containing some metadata which will definitely need to be part of the search interface, but the actual textual content itself will probably also need to be.

2020-11-17T14:31:33.195100Z

search implies filter, you have to say what the filter critera is. That defines the functionality. e.g if you can only need to support an exact string match on a XML field (the header?) then you can store that in postgres and easily query directly by it. Postgres also likely supports extensions that can do more then an exact string match. But the sky is the limit, Google Search takes into account my age and location when i search.

simongray 2020-11-17T14:35:53.198800Z

Well, various criteria. Some should be exact string matches (e.g. filter documents with a specific author) while some are dates or numeric (e.g. filter documents written between 1933 and 1948).

2020-11-17T14:38:20.201Z

Yea, those should be handled by a database with a schema. If your not sure what performance characteristics any unstructured search will need, or if the exist at all, then your best bet is to do as little as possible and see what people need and ask for.

simongray 2020-11-17T14:39:28.202Z

Ok. So postgres for storing metadata, but access the files through the filesystem?

2020-11-17T14:42:54.204800Z

I'm using the term file system fairly loosely. Unless you plan on support searching by images e.g give me all the cats based soley on the information in the picture and not a text label. Then what you need from a query perspective is just a key (the file name) and its value (the image). So anything that can do an acceptable fast lookup (i dont know what that means here) and is cheap enough (again, no idea) will work.

simongray 2020-11-17T14:45:26.208300Z

Ok, that makes sense. I know that postgres supports adding XML as a datatype, but I am unsure of the benefits. My first hunch was also just to keep the files on disk and simply associate their file paths with some metadata in the database.

lukasz 2020-11-17T14:46:01.209400Z

The product I'm working on is in large part what you're describing (with differences of course: we store audio, video and import written content from various sources such as Zendesk, Google Drive, Intercom and more). All static assets are in S3, Postgres stores content in our own format (jsonb in PG) and all of the metadata and content is indexed in Elasticsearch.

2020-11-17T14:50:06.213900Z

Yep, those are resonable modern solutions. Its potentially save to bang something out in clj and store some files in something like s3 with replication until you define how much you need postgres and elastic search though.

simongray 2020-11-17T14:50:08.214Z

Maybe I should mention that this is meant to be used by a relatively small userbase with very few concurrent users and I expect everything to be running on a single machine.

simongray 2020-11-17T14:50:35.214900Z

It’s a research project, so only highly specialised researchers will have access to it.

lukasz 2020-11-17T14:50:36.215Z

Ah, no need for ES then :-)

simongray 2020-11-17T14:50:45.215500Z

that’s what I thought 🙂

2020-11-17T14:50:46.215600Z

Oh then yea. Just bang something out in an hour in pure clj and evolve overtime.

lukasz 2020-11-17T14:50:54.216Z

PG's full text search is going to be plenty

lukasz 2020-11-17T14:51:13.216900Z

I believe PG has a special XML data type as well, so it might also support some interesting query patterns a'la jsonb

lukasz 2020-11-17T14:51:42.217900Z

So potentially you could squeeze all of that there with very few moving parts

2020-11-17T14:51:47.218100Z

You can likely get away from even using Posgres though. Just make sure you have some way to replicate the data so the chances of losing it are really low.

simongray 2020-11-17T14:52:15.218500Z

thanks for the input guys. 🙂

2020-11-17T14:52:35.219Z

use something like datascript and just recompute the index every time you search.

2020-11-17T14:52:51.219300Z

if its to slow then move to a database.

simongray 2020-11-17T15:04:10.222100Z

@drewverlee I think Datascript would be fine if I only had to serve the XML, but I also need persistence for the comments.

2020-11-17T15:05:53.223800Z

persist the xml file and the image, but you can just read all of them at search time. If its like 20 xml files then downloading them everytime won't be that big of a deal. And its something you can finish in an hour.

emccue 2020-11-17T21:55:46.225200Z

@jon920 Its been 2 days, but to add on to what you are thinking

emccue 2020-11-17T21:56:43.226300Z

if you want "encapsulation" in clojure you aren't going to get it, but you can easily make a "boundary" for a system where users are meant to use public functions to access and work with things

emccue 2020-11-17T21:56:50.226600Z

and not access map keys directly

emccue 2020-11-17T21:57:06.227Z

the easiest way to signal something like this would be to use namespaced keys

emccue 2020-11-17T21:57:10.227200Z

in the same way this python

emccue 2020-11-17T21:57:50.228500Z

class Apple:
    def __init__(self, color):
        self.__color = color
    
    @property
    def color(self):
        return self.__color

emccue 2020-11-17T21:58:08.228900Z

interprets the __color field as _Apple_color

emccue 2020-11-17T21:58:45.229900Z

and thus it is clear that outside the definition of that class, it would be a paux fas to read or modify that field directly

emccue 2020-11-17T21:59:10.230300Z

you can use namespaced keywords in clojure to achieve the same effect

emccue 2020-11-17T21:59:43.231200Z

(ns my.ns)

(defn create-apple [color]
  {::color color})

(defn color [apple]
  (::color apple))

emccue 2020-11-17T21:59:55.231500Z

where ::color expands to :my.ns/color

emccue 2020-11-17T22:00:19.232Z

so it is a signal to other namespaces only to mess with that key if it is documented how to do so

emccue 2020-11-17T22:01:18.233200Z

the value of doing that for everything is kinda questionable - especially when it is just data

emccue 2020-11-17T22:01:27.233500Z

and it all falls under "techniques that only work if everyone agrees to them"

emccue 2020-11-17T22:02:19.234100Z

but it is at least a way to make some things "private"

emccue 2020-11-17T22:04:38.236700Z

@simongray Is it only thousands of XML files?

emccue 2020-11-17T22:05:16.238Z

kinda a dumb approach, but you can just buffer that junk in memory and do straight filters

emccue 2020-11-17T22:05:36.238700Z

then access corresponding images in s3

fubar 2020-11-17T22:08:34.241700Z

@emccue Thanks good advice! I was reading my DDD book more today and came across a section on functional programming. It says that “the anemic domain model pattern is actually a fundamentally useful concept when using functional programming as opposed to being an anti-pattern … the most important domain concepts are verbs — not the nouns like a bank account, but the verbs like transferring funds. With functional programming and the anemic model, you still have the power to fully express domain verbs, and consequently to have meaningful conversations with domain experts… when building functional domain models, it is still possible to have structures that represent domain concepts, even when using the anemic domain model pattern. Significantly though they are just data structures with no behavior-- so a behavior-rich, object-oriented BankAccount entity (with Deposit() and IncreaseOverdraft()) would be modeled only as a pure immutable data structure (shows a struct without those methods). Having reduced objects into pure data structures, behavior then exists as pure functions… challenge is to cohesively group and combine them aligned with the conceptual domain model. One effective option is to group functions into aggregates” So it sounds like an “anemic” domain model where you have dumb structs and put all of the domain logic into domain layer service modules could be the way to go with FP. Then like you say it’s a faux pax to modify the structs directly, it should be done through these domain service (verb) modules

emccue 2020-11-17T22:11:45.242300Z

My eyes glaze over whenever i hear the word domain, but yeah

fubar 2020-11-17T22:11:57.242800Z

And the aggregates… which I haven’t figured out yet but I’ll get there

emccue 2020-11-17T22:12:11.243200Z

Here is kinda sorta and example from a project i am working on

emccue 2020-11-17T22:12:29.243700Z

;; ----------------------------------------------------------------------------
(defn by-id [db id]
  (jdbc/execute-one! db ["SELECT * FROM post WHERE id=?" id]))

;; ----------------------------------------------------------------------------
(defn created-by
  "Returns the user the post was created by."
  [db post]
  (jdbc/execute-one! db
                     ["SELECT * FROM post
                       INNER JOIN page ON post.page_id = page.id
                       INNER JOIN \"user\" ON page.user_id = \"user\".id
                       WHERE post.id = ?
                       LIMIT 1"
                      (:post/id post)]))

;; ----------------------------------------------------------------------------
(defn can-access?
  [db post user]
  (or
    (not (get-in post [:post/content :options :hidden]))
    (= (:user/id (created-by db post))
       (:user/id user))))


;; ----------------------------------------------------------------------------
(defn reaction-counts
  "Returns a list of reactions and their counts - non-nil.
  Each element is
    :token - the emoji react
    :count - the number of reactions of that kind."
  [db post]
  (let [reaction-counts (jdbc/execute!
                          db
                          ["SELECT token, COUNT(*) FROM reaction
                            WHERE post_id = ?
                            GROUP BY token"
                           (:post/id post)])]
    (mapv
      #(utils/rename-key % :reaction/token :token)
      reaction-counts)))

emccue 2020-11-17T22:12:46.244100Z

I have a post namespace that has functions that deal with posts directly

emccue 2020-11-17T22:13:09.244500Z

and all the keywords are namespaced with their table name in sql

emccue 2020-11-17T22:13:37.245Z

but i'm not treating those as "private" things even though they are namespaced keywords

emccue 2020-11-17T22:13:50.245300Z

other parts of the code can see :post/id if they want to

emccue 2020-11-17T22:14:07.245700Z

in the same way this namespace feels free to look into :user/id

emccue 2020-11-17T22:14:37.246100Z

But when I have a more complicated, stateful system

emccue 2020-11-17T22:14:49.246500Z

;; ----------------------------------------------------------------------------
(defn create-chat-subsystem
  "Creates an object that holds the info required to manage
  the chat subsystem, including sending notifications to
  users when messages are sent."
  [db ^JedisPool redis-client-pool]
  (let [;; Map of user-id to callbacks to call when a
        ;; new message comes through for them.
        connections (atom {})
        subsystem {::connections connections
                   ;; Objects needed to manage subscribing to redis
                   ;; for messages posted on other nodes.
                   ::redis-client (.getResource redis-client-pool)
                   ::redis-pub-sub (chat-messages-listener db connections)
                   ::subscription-executor (Executors/newSingleThreadExecutor
                                             (-> (BasicThreadFactory$Builder.)
                                                 (.namingPattern "chat-subsystem-%s")
                                                 (.build)))}]
    (.submit (::subscription-executor subsystem)
             (reify Runnable
               (run [_]
                 (utils/restart-on-failure
                   (.psubscribe (::redis-client subsystem)
                                (::redis-pub-sub subsystem)
                                (into-array [(message-key "*" "*")]))))))
    subsystem))

;; ----------------------------------------------------------------------------
(defn shutdown-chat-subsystem! [chat-subsystem]
  (log/info ::shutdown-step "Unsubscribing from channels.")
  (.punsubscribe (::redis-pub-sub chat-subsystem))

  (log/info ::shutdown-step "Returning redis client to the pool.")
  (.close (::redis-client chat-subsystem))

  (log/info ::shutdown-step "Shutting down the executor")
  (.shutdownNow (::subscription-executor chat-subsystem)))

emccue 2020-11-17T22:15:04.246900Z

I use keywords namespaced with the full namespace in the code

emccue 2020-11-17T22:15:11.247100Z

and I do treat those as private

emccue 2020-11-17T22:16:18.248Z

;; ----------------------------------------------------------------------------
(defn attach-user-session! [chat-subsystem user-id callback]
  (swap! (::connections chat-subsystem)
         (fn [users]
           (update users user-id conj callback))))

;; ----------------------------------------------------------------------------
(defn remove-user-session!
  [chat-subsystem user-id callback]
  (swap! (::connections chat-subsystem)
         (fn [users]
           (let [new-callbacks-for-user (remove #{callback} (users user-id))]
             (if (empty? new-callbacks-for-user)
               (dissoc users user-id)
               (assoc users user-id new-callbacks-for-user))))))

emccue 2020-11-17T22:16:42.248400Z

and then the public functions in the namespace are my contract for interaction

emccue 2020-11-17T22:19:27.249600Z

but if there were invariants between fields in data, like a bank account, being less transparent would also make sense

emccue 2020-11-17T22:19:47.250Z

all of which is to say I am lightyears from a consistent opinion

emccue 2020-11-17T22:24:37.250400Z

maybe I should have

emccue 2020-11-17T22:24:42.250800Z

(defn id [user]
  (:user/id user))

;; or (def id :user/id) - that would work too since keywords are fns

emccue 2020-11-17T22:24:54.251200Z

and use (user/id ...) in other namespaces

fubar 2020-11-17T22:27:29.251700Z

Nice I like that technique

fubar 2020-11-17T22:35:57.255800Z

This book also mentioned a “memento” pattern for encapsulation when doing functional C#, where you have a function that returns a “memento” struct which is a version of the original domain struct but with all of the private attributes filtered out. Though they mention that as more of an option for passing up to the UI rather than a tool for protecting invariants inside the domain. So I think your technique would be better for that

seancorfield 2020-11-17T22:38:56.257200Z

Many OOP patterns just disappear in FP: immutable values and (higher order) functions mean that a lot of those OOP patterns aren't needed because the inherent complexity just isn't there in FP.

seancorfield 2020-11-17T22:39:37.258300Z

(there are some patterns in FP that have no equivalence in OOP as well)

2020-11-17T22:39:59.258800Z

Its good to think about "encapsulation" for what it gives you, and not just for the sake of it. Like why do you want to encapsulate things to begin with? Once you answer that, you can more easily think... Ok how can I get this property in Clojure? Do I need encapsulation? Can other thing give me the same property?

👍 1
seancorfield 2020-11-17T22:41:05.259700Z

Yeah, I don't like to see the equivalent of "getters" in Clojure code unless they add specific value over just accessing data fields directly.

➕ 1
lukasz 2020-11-17T22:43:27.263100Z

This is pretty old, but shows how most OOP patterns are just unnecessary in Clojure: http://mishadoff.com/blog/clojure-design-patterns/#intro

seancorfield 2020-11-17T22:43:28.263200Z

(at work we have a code base that stretches back over a decade and some areas have "getters" because we want to hide the fact that some parts of the code traffic in lowercase keys, some parts in camelCase keys, and some parts in qualified/keys as we refactor and modernize the code)

seancorfield 2020-11-17T22:44:55.264600Z

Ah, Pedro and Eve -- that's a fun blog post!

seancorfield 2020-11-17T22:45:35.265500Z

And for FP-specific stuff: https://www.infoq.com/presentations/Clojure-Design-Patterns/ is a good talk

lukasz 2020-11-17T22:46:52.266900Z

🤦 Only just now I got the names of people in that article

fubar 2020-11-17T22:46:57.267Z

Well the encapsulation gets you guarantees on invariants so that you don’t end up with a FlightBooking with a null departure date or a BarCustomer with an age under 21. In a statically typed language a lot of that can be done with the type system like in F#. Otherwise I’m thinking of performing data validation during runtime at the boundaries of the domain when outer layers are accessing the data, with something like Clojure spec. So you could still violate the invariants from within the domain layer, but it won’t escape outside of the system as long as you validate it at every point of egress

lukasz 2020-11-17T22:47:40.267500Z

"every point of egress" is an overkill, you validate where it matters

2020-11-17T22:48:37.268500Z

@jon920 Ok, so what you want are to assert your data invariants? Then its not "encapsulation" per say that you want. So now the question is, How, in Clojure, can you assert data invariants and protect yourself against data modification that would corrupt your data invariant?

2020-11-17T22:49:13.268800Z

In my opinion, like you mentioned, Spec is the way to go.

fubar 2020-11-17T22:50:27.269500Z

Sure spec can work, I just have to (ideally) ensure that the data is validated at every point before it can be persisted or used to perform a calculation

fubar 2020-11-17T22:51:55.271900Z

With OOP + encapsulation or a type system enforcing those invariants, if you have a class/data type that guarantees those invariants, you can persist or use it without having to check. Without that it seems like I’d have to identity every place where that could happen and run the validations. Maybe using macros?

fubar 2020-11-17T22:52:36.273Z

Sorry I have to eat but I’ll check back, thanks for all the great advice so far

2020-11-17T22:52:44.273200Z

Ya, its not that much effort. In my opinion, its much better than data encapsulation. With data encapsulation, you only have a "best intention" protection. In that you hope that all devs who will change the functions that are allowed to change the data to know what all the invariants should be, and not mess up their code change in a way that would break it. With Spec you get a formal language to define the invariants, and automatically validate them, so even if a dev mess up, it'll be caught, and your prod data won't be corrupted.

2020-11-17T22:58:29.278400Z

And, you can still provide some "relaxed form" of encapsulation. Like say you have a Domain Aggregate, have some namespace for it. Put in that namespace the functions that are supposed to modify the entities and value types of the aggregate. And use that namespace everywhere. Now sure, your colleague could decide: Screw Jon and his stupid abstraction, I'm going to directly call the DB without going through his domain aggregate. And you know what, they can do that in Haskell or Scala or Java or F# just as well. At this point, have some team standards, hope the CR catches things, make it obvious that this data should be changed by functions in that namespace, etc. Same thing if your data is not in a DB, if its just in-memory, make the variable private to that namespace, or make it obvious what the right way to modify it is. You can try to add all kind of extra "pain" for someone to bypass your guards, but your colleague always has a way to do it, since they control the source, they can just change the guard's source code as well, or any other shenanigans.

2020-11-17T22:59:32.279300Z

What Clojure does though, which is WAYY better than Java, is by having it all immutable, it will be hard to accidentally break the data due to inadvertent mutation.

2020-11-17T23:02:11.281200Z

Oh, and one last point, while static types can assert some invariant (and its cool they do it statically), in practice, its just not powerful enough to truly protect my data from corruption. So I still find Spec is a great tradeoff here, yes you move the validation to runtime, but you can be much more sure that your invariants are held, since you can model them much more precisely.

seancorfield 2020-11-17T23:05:55.284800Z

I think there's also a lot of nervousness from folks who come from a statically typed background, especially with OO languages, where they're used to the type system and enforced encapsulation preventing a lot of mistakes that would otherwise be easy to make with mutable data -- and there's a temptation for them to view Spec as a "replacement" for a "type system" (it isn't) and to overuse it so they have a "type signature" on lots of their functions and they add it lots of s/valid? calls in places where they wouldn't validate data in their "home" language. That Clojure is so very, very different to that world takes some folks a long time to accept and really internalize.

👍 1
2020-11-17T23:08:03.285Z

For sure. The learning to ride a bike analogy applies well here. It'll take someone a while after they take off the training wheels to not be scared they're going to fall and hurt themselves. They need to gain more confidence that... oh ya, they don't actually fall down anymore, and ya, those training wheels were actually useless at this point.

2020-11-17T23:09:27.285200Z

I see it a lot... But what about type errors? How will I not make them! And its like... Relax, you're not going to push code to prod that have type errors, you're smart, you'll catch them at the REPL or in your tests, don't worry about it.

seancorfield 2020-11-17T23:11:14.285400Z

(and part of that security comes from developing a good REPL workflow, which can also take a while since it is so very different from how you work in other languages)

➕ 1
2020-11-17T23:11:35.285600Z

I also like how the trade offs for the bike analogy are the same too. You take the training wheel off when you need to go faster and take sharper turns. Which is pretty much the same benefit I see with Clojure being dynamically typed.

2020-11-17T23:17:10.290100Z

I've been telling people that you should only add s/valid? in places that send data to the outside, and potentially receive data from the outside (unless you know the sender has already run s/valid? on it or vice versa. That's because, when you go to prod, all your internal data flows have been tested by you at the REPL, with QA, by your unit tests, by your integ tests, in beta use, in staging, on your pre-prod stage, etc. So at that point, you should be very confident that your internal data flow is correct and has no bug. It doesn't need to be validated anymore. Which is why you should instrument in those cases, but not once you go to prod. But, outside interaction you cannot predict, who knows what data the user is going to send you. What you do know is, given valid data, everything works, but you don't know what happens given invalid data, so instead of trying to handle invalid data, you just run s/valid? and reject invalid data. This happens to be true even for strong statically typed languages. The types can't assert statically at compile time that outside actors won't send you invalid data when in prod. And if you only use your static type definitions to perform runtime validation of it, you don't get very good coverage, since most likely, not the entire range of possible String type is valid input. So most of the time, people need to add ad-hoc validation code on top of their type definitions. Its kind of annoying, now your data specification is actually in two place, partially in the types, and then in some custom validation functions.

fubar 2020-11-17T23:42:36.291500Z

> And, you can still provide some “relaxed form” of encapsulation. Like say you have a Domain Aggregate, have some namespace for it. Put in that namespace the functions that are supposed to modify the entities and value types of the aggregate. And use that namespace everywhere. > make it obvious that this data should be changed by functions in that namespace I like those ideas, that is what I’m leading toward right now