malli

https://github.com/metosin/malli :malli:
ikitommi 2020-11-09T06:47:03.460700Z

@borkdude tested the malli + edamame with location preserving metadata. Few notes: • current impl of transform is not aware of the path where the transformation happens. If it did, the solution would be almost trivial • good thing is that we have other tools to go around the limitation: we can walk the schema and inject :in to all subschema properties and use :compile in a transformer to read this info at decoder creation time and to collect all value path -> loc into a request-scoped atom as we decode the Wrapped values • transformers compose, so we can also do string->edn etc in a single sweep • for the transformed value, we call explain and in case of errors, attach the locs from the atom, as explain errors know already both schema and value paths • solution has lot of boilerplate, and found a :set explain bug (wrote https://github.com/metosin/malli/issues/294 ), with that, this could even work 😎https://gist.github.com/ikitommi/e3229a0bcef532d1fa032321713227d3

ikitommi 2020-11-09T06:48:27.461400Z

good thing about using a lookup-table is that it doesn’t need to throw and reports all errors.

ikitommi 2020-11-09T06:51:55.464100Z

also, if edamame could accumulate and provide the full path to a given element in :postprocess, it could be used here too, e.g. extra key + value in Wrapped like :in [:tags "address"].

ikitommi 2020-11-09T06:53:39.464600Z

(def Address
  [:map
   [:id string?]
   [:tags [:set keyword?]]
   [:address
    [:map
     [:street string?]
     [:city string?]
     [:zip int?]
     [:lonlat [:tuple double? double?]]]]])

;; string->edn, no coercion
(let [coerce (coercer Address)]
  (coerce (slurp "schema.edn")))
;{:schema [:map
;          [:id string?]
;          [:tags [:set keyword?]]
;          [:address
;           [:map
;            [:street string?]
;            [:city string?]
;            [:zip int?]
;            [:lonlat [:tuple double? double?]]]]],
; :value {:id "Lillan",
;         :tags #{":hotel" :coffee :artesan},
;         :address {:lonlat [61.4858322 23.7854658],
;                   :city "Tampere",
;                   :street "Ahlmanintie 29",
;                   :zip "33100"}},
; :errors (#Error{:path [:tags 0], 
;                 :in [:tags 0], ;; <--- the set value paths are incorrect #294
;                 :schema keyword?,
;                 :value ":hotel",
;                 :loc {:row 2, :col 10, :end-row 2, :end-col 18}}
;           #Error{:path [:address :zip],
;                  :in [:address :zip],
;                  :schema int?,
;                  :value "33100",
;                  :loc {:row 5, :col 17, :end-row 5, :end-col 24}})
; :string "{:id \"Lillan\"
;           :tags #{:artesan :coffee \":hotel\"}
;           :address {:street \"Ahlmanintie 29\"
;                     :city \"Tampere\"
;                     :zip \"33100\"
;                     :lonlat [61.4858322, 23.7854658]}}
;          "}

;; string->edn, with malli string-coercion
(let [coerce (coercer Address (mt/string-transformer))]
  (coerce (slurp "schema.edn")))
; => nil

ikitommi 2020-11-09T14:02:17.467200Z

@borkdude #294 is fixed in master and the edamame-walking works now and is bit simpler: 1. parse with edamame 2. prewalk twice to get both the original EDN and the path-vec -> loc lookup table 3. glue things together 4. kudos to @nilern for a working walker 5. https://gist.github.com/ikitommi/e3229a0bcef532d1fa032321713227d3

ikitommi 2020-11-09T14:03:05.467300Z

ikitommi 2020-11-09T14:05:21.469100Z

it automatically binds a transformer named :edamame, so you can add custom decoding hints to schemas:

[:string {:decode/edamame str/upper-case}]
… and if sci is enabled, the schemas can be read from files too.

2020-11-09T18:53:14.471700Z

i have a particularly complex schema where the initialization of e.g. (m/validator) or (m/transformer) is fairly slow -- about 300ms. one way to deal with this would be to cache these -- is my understanding correct that using a registry will effectively do this ? or will i need to write my own caching layer on top of it ?

2020-11-09T18:54:23.472100Z

or are registries just a very simple way of organizing stuff, without any pre-parsing going on ?

ikitommi 2020-11-09T19:09:43.474700Z

@lmergen the schema creation will get a 10x boost soon, the slow part being m/schema. If you add Schema instances into registry, it happends just once. Or you can just use a var:

(def Address (m/schema [:map [:street :string]]))

ikitommi 2020-11-09T19:10:34.476100Z

… but, for super fast validation, you should just create the m/validator once and reuse that. it returns a pure and optimized function.

2020-11-09T19:12:25.478600Z

right, I think I’ll just go for that last option. fairly often various of these validators are used in hot code paths, so I’ll probably write something to cache validators instead.

2020-11-09T19:13:52.479700Z

but if I wrap things in an m/schema call, it’ll already do a lot of preprocessing, right?

ikitommi 2020-11-09T19:15:44.480300Z

right. I’ll run some flamegraphs. just a sec.

ikitommi 2020-11-09T19:24:30.480600Z

(time
  (prof/profile
    (dotimes [_ 50000]
      (m/validate [:map [:street :string]] {:street "hämeenkatu"}))))
;; "Elapsed time: 10472.153783 msecs"

(let [schema (m/schema [:map [:street :string]])]
  (time
    (prof/profile
      (dotimes [_ 500000]
        (m/validate schema {:street "hämeenkatu"})))))
;; "Elapsed time: 231.093848 msecs"

(let [validate (m/validator [:map [:street :string]])]
  (time
    (prof/profile
      (dotimes [_ 500000]
        (validate {:street "hämeenkatu"})))))
;; "Elapsed time: 59.743646 msecs"

hoynk 2020-11-13T18:02:18.045800Z

May I ask what profile lib you are using?

hoynk 2020-11-13T18:50:59.046400Z

thx

2020-11-09T19:25:06.481Z

right, this makes a lot of sense.

ikitommi 2020-11-09T19:25:51.481200Z

ikitommi 2020-11-09T19:26:55.482800Z

m/schema uses satisfies? which has a perf issue, most of the time spent there.

ikitommi 2020-11-09T19:30:10.483800Z

or was it two orders of magnitude? satisfies? seems to take at least 95% of the time.

ikitommi 2020-11-09T19:30:20.483900Z

2020-11-09T19:37:40.484600Z

ok, this is very helpful

2020-11-09T19:41:49.488800Z

I really do find the validators to be significantly faster than spec validate — it’s about 3x faster for my fairly insane schema (the same that takes 300ms to parse). better yet, and this was unexpected: the generators are also much faster. I’m not 100% certain yet because whether this is because maybe Malli takes some shortcuts, but i seem to be able to avoid a few annoying gen/such-that? generators with Malli that causes a very large speed up.

1👍
ikitommi 2020-11-10T09:21:33.496500Z

this is interesting. tried to avoid such-that?, e.g. setting min & max when known, but have not tested against spec gen.

ikitommi 2020-11-09T21:43:47.489600Z

new flames with cache

1👀
ikitommi 2020-11-09T21:45:04.490700Z

10472ms => 568ms (18x faster)