@borkdude tested the malli + edamame with location preserving metadata. Few notes:
• current impl of transform
is not aware of the path where the transformation happens. If it did, the solution would be almost trivial
• good thing is that we have other tools to go around the limitation: we can walk the schema and inject :in
to all subschema properties and use :compile
in a transformer to read this info at decoder creation time and to collect all value path
-> loc
into a request-scoped atom as we decode the Wrapped
values
• transformers compose, so we can also do string->edn
etc in a single sweep
• for the transformed value, we call explain
and in case of errors, attach the loc
s from the atom, as explain
errors know already both schema and value paths
• solution has lot of boilerplate, and found a :set
explain bug (wrote https://github.com/metosin/malli/issues/294 ), with that, this could even work 😎
• https://gist.github.com/ikitommi/e3229a0bcef532d1fa032321713227d3
good thing about using a lookup-table is that it doesn’t need to throw and reports all errors.
also, if edamame could accumulate and provide the full path to a given element in :postprocess
, it could be used here too, e.g. extra key + value in Wrapped like :in [:tags "address"]
.
(def Address
[:map
[:id string?]
[:tags [:set keyword?]]
[:address
[:map
[:street string?]
[:city string?]
[:zip int?]
[:lonlat [:tuple double? double?]]]]])
;; string->edn, no coercion
(let [coerce (coercer Address)]
(coerce (slurp "schema.edn")))
;{:schema [:map
; [:id string?]
; [:tags [:set keyword?]]
; [:address
; [:map
; [:street string?]
; [:city string?]
; [:zip int?]
; [:lonlat [:tuple double? double?]]]]],
; :value {:id "Lillan",
; :tags #{":hotel" :coffee :artesan},
; :address {:lonlat [61.4858322 23.7854658],
; :city "Tampere",
; :street "Ahlmanintie 29",
; :zip "33100"}},
; :errors (#Error{:path [:tags 0],
; :in [:tags 0], ;; <--- the set value paths are incorrect #294
; :schema keyword?,
; :value ":hotel",
; :loc {:row 2, :col 10, :end-row 2, :end-col 18}}
; #Error{:path [:address :zip],
; :in [:address :zip],
; :schema int?,
; :value "33100",
; :loc {:row 5, :col 17, :end-row 5, :end-col 24}})
; :string "{:id \"Lillan\"
; :tags #{:artesan :coffee \":hotel\"}
; :address {:street \"Ahlmanintie 29\"
; :city \"Tampere\"
; :zip \"33100\"
; :lonlat [61.4858322, 23.7854658]}}
; "}
;; string->edn, with malli string-coercion
(let [coerce (coercer Address (mt/string-transformer))]
(coerce (slurp "schema.edn")))
; => nil
@borkdude #294 is fixed in master and the edamame-walking works now and is bit simpler: 1. parse with edamame 2. prewalk twice to get both the original EDN and the path-vec -> loc lookup table 3. glue things together 4. kudos to @nilern for a working walker 5. https://gist.github.com/ikitommi/e3229a0bcef532d1fa032321713227d3
it automatically binds a transformer named :edamame
, so you can add custom decoding hints to schemas:
[:string {:decode/edamame str/upper-case}]
… and if sci is enabled, the schemas can be read from files too.i have a particularly complex schema where the initialization of e.g. (m/validator)
or (m/transformer)
is fairly slow -- about 300ms. one way to deal with this would be to cache these -- is my understanding correct that using a registry will effectively do this ? or will i need to write my own caching layer on top of it ?
or are registries just a very simple way of organizing stuff, without any pre-parsing going on ?
@lmergen the schema creation will get a 10x boost soon, the slow part being m/schema
. If you add Schema instances into registry, it happends just once. Or you can just use a var:
(def Address (m/schema [:map [:street :string]]))
… but, for super fast validation, you should just create the m/validator
once and reuse that. it returns a pure and optimized function.
right, I think I’ll just go for that last option. fairly often various of these validators are used in hot code paths, so I’ll probably write something to cache validators instead.
but if I wrap things in an m/schema call, it’ll already do a lot of preprocessing, right?
right. I’ll run some flamegraphs. just a sec.
(time
(prof/profile
(dotimes [_ 50000]
(m/validate [:map [:street :string]] {:street "hämeenkatu"}))))
;; "Elapsed time: 10472.153783 msecs"
(let [schema (m/schema [:map [:street :string]])]
(time
(prof/profile
(dotimes [_ 500000]
(m/validate schema {:street "hämeenkatu"})))))
;; "Elapsed time: 231.093848 msecs"
(let [validate (m/validator [:map [:street :string]])]
(time
(prof/profile
(dotimes [_ 500000]
(validate {:street "hämeenkatu"})))))
;; "Elapsed time: 59.743646 msecs"
May I ask what profile lib you are using?
mostly https://github.com/clojure-goes-fast/clj-async-profiler g https://github.com/hugoduncan/criterium
thx
right, this makes a lot of sense.
m/schema
uses satisfies?
which has a perf issue, most of the time spent there.
or was it two orders of magnitude? satisfies?
seems to take at least 95% of the time.
ok, this is very helpful
I really do find the validators to be significantly faster than spec validate — it’s about 3x faster for my fairly insane schema (the same that takes 300ms to parse). better yet, and this was unexpected: the generators are also much faster. I’m not 100% certain yet because whether this is because maybe Malli takes some shortcuts, but i seem to be able to avoid a few annoying gen/such-that? generators with Malli that causes a very large speed up.
this is interesting. tried to avoid such-that?
, e.g. setting min & max when known, but have not tested against spec gen.
new flames with cache
10472ms => 568ms (18x faster)