@ryan.is.gray We've historically used s/conformer
at work and even published a library of "conforming specs" that are useful for validating (and lightly transforming) form data -- from strings to numbers, booleans, etc. But we're looking at switching to https://github.com/wilkerlucio/spec-coerce which derives coercions from specs so that you can write your specs for just the target data you want and then run your string input through spec-coerce
and then into Spec itself, keeping the two concerns separated.
> data meat grinder That's so metal! \m/ >.< \m/
my scenario is that, I donāt know what functions a namespace provides. I use the repl for hints.
right, in my code I didn't need to know either
it iterates the namespace and creates the completions
but readline (because of the way it is used) is limited to analyzing what the repl prints to you and what you type in, for anything smarter I think you need an integrated tool
(for your use case, instead of just iterating the namespace, it could also iterate the ns-aliases)
I strongly suggest using coax instead @seancorfield. It's more complete/battle tested. It's also way more performant
(it started as a fork of spec-coerce)
Ah, good to know @mpenet -- I'll look into it.
Oh, the exoscale library? I have looked at it before so thanks for reminding me!
thanks @seancorfield š
Iāve got a problem:
I have a protocol in a library dependency. I have a defrecord that extends the protocol. When I run (satisfies? Protocol record)
it says false, which it really shouldnāt. But when I reload the namespace then suddenly it does extend the protocol.
> when I reload the namespace The namespace that defines the protocol? Or the one that defines the record?
the one that defines the record
I know that reloading the namespace that defined the protocol breaks existing implementations, but in this case I start at this state
I start the REPL and ask satisfies?
it will return false, even though the protocol implementation is in defrecord
definition
that is before doing any reloads whatsoever
Something in your workflow is probably implicitly reloading the protocol namespace. I can't really say anything else without an MRE.
why would it be implicitly reloaded? The require
directives donāt load already loaded namespaces, right?
You said you were using some library that defines that protocol. Do you know for sure that that library never reloads anything?
Also, did you try to reproduce it using just clj
as your REPL, with nothing else?
I can try that now
I swapped a couple of branches and did a clean and now the error is goneā¦
weirdest thing
https://clojure.org/reference/refs says Clojure refs implement snapshot isolation. Is somewhere stated what kind of concurrency issues this isolation level (specifically in Clojure code) can cause? Does anyone experienced such issues in practice?
The concurrency issue you are most likely to see with refs is write skew (because read-only refs are not part of the default ref set that can cause a transaction to retry). But thatās easily worked around when itās an issue by using ensure
instead of deref
to add the ref to the ref set even on read.
Pulling at straws here @roklenarcic, but a dissoc on a record field returns a map. Any chance you did a dissoc on your record?
Clojure 1.10.1
user=> (defprotocol MyProtocol (my-fn [this a]))
MyProtocol
user=> (defrecord MyRecord [my-field] MyProtocol (my-fn [this a] a))
user.MyRecord
user=> (def r (->MyRecord "field-value"))
#'user/r
user=> (satisfies? MyProtocol r)
true
user=> (my-fn r 42)
42
user=> (def r2 (dissoc r :my-field))
#'user/r2
user=> (satisfies? MyProtocol r2)
false
user=> (my-fn r2 42)
Execution error (IllegalArgumentException) at user/eval145$fn$G (REPL:1).
No implementation of method: :my-fn of protocol: #'user/MyProtocol found for class: clojure.lang.PersistentArrayMap
user=> (type r2)
clojure.lang.PersistentArrayMap
user=> (type r)
user.MyRecord
no, but thatās a good trick, never thought of thatā¦
Is there anything written down about the decision to use an explicit āensureā rather than track all deref? For my curiosity
adding read-only refs to the ref set means your transactions have a greater chance of failure and retry. but it's not always necessary. so the current setup gives you the option and a way to choose your semantics. if they were always included, you would have no way to weaken that constraint when needed.
like say you had two refs - one for an account balance and one for transaction fee amount. you have a transaction that updates the balance and assesses the fee (which only needs to be read). if the fee changes infrequently and the exact moment when a change starts being applied is not important to the business, it's fine to just deref the fee ref. but if it's really important that the fee change takes effect immediately, you could ensure that ref
What kind of maps are struct maps good for reducing the size of? My (limited )experiments so far show arraymaps to be consistently smaller.
I thought those were deprecated
@noisesmith Nope. Just better served by records, https://clojure.org/reference/data_structures#StructMaps
I think struct-maps were basically a first pass experiment that lead to defrecords
I would not really expect them to better than anything at anything
Unfortunately records don't support namespaced keys or any other kind of key for that matter.
If you are looking into implementation level details of why certain data structures use the amount of memory that they do, and want something that can draw pictures of JVM objects and references between them for you, you might enjoy tinkering with the cljol library: https://github.com/jafingerhut/cljol
I have not used it to investigate struct maps before, and haven't had an occasion to delve into struct map implementation. array maps are good for memory utilization, for sure, but they do have O(n) lookup time, so something to keep in mind if you ever want to make a big one (that and as soon as you take a large array map and create an updated version of it with operations like assoc, etc., you will typically get back a hash map)
defrecords do support other kinds of keys and namespaced keys, they just don't get turned into object fields
@hiredman but doesn't that negate optimizations?
@hiredman In fact, it adds an extra 8 bytes of overhead! :p
user=> (defrecord Foo [])
user.Foo
user=> (->Foo)
#user.Foo{}
user=> (assoc (->Foo) ::a 1)
#user.Foo{:user/a 1}
user=>
not supporting as well as you would like is not the same thing as not supporting at all
@hiredman Sure. But there's no size optimization to be had by using one that way.
user=> (mm/measure (assoc (->X) ::a 1))
"264 B"
user=> (mm/measure (assoc {} ::a 1))
"232 B"
I would be surprised if you found a built-in Clojure data structure for maps that is lower memory than array-map, and also supported qualified keywords as keys. But I haven't done the measurements you are doing -- just giving guess from knowledge I do have. array-maps are O(n) lookup time, as I mentioned above, and 'fragile', as mentioned above. Note that keywords are interned, i.e. only stored in memory once, so the mm/measure results you are showing probably contain all of the objects for the keyword once, but if you did a similar measurement for 1000 objects that use the same keyword repeatedly, the keyword memory is only counted once overall (as it should be, since it is only stored once in memory)
Is there a java-y solution for spinning up multiple workers running the same callable over and over or just an infinite length task? I'm imagining a threadpool where you define the size and have a method to interrupt the whole pool.
there's probably an Executor that makes this easy, they can own pools
interruption on the jvm is tricky, period, unless you use one of a specific set of predefined interruptable methods, or are OK with checking a sentinel value and shutting down manually at execution boundaries
I thought interruption was okay, you check isInterrupted and catch the exception, either happens, you quit?
I've missed the executor if it exists :(
I don't think that is good
Like, in general, you want structured concurrency, tree shaped task graphs, forkjoin, etc
What you are asking for is extremely unstructured
It doesn't even have the structure of iteration where previous results feed back in, just the same callable over and over
It basically demands side effects as the only way to have results
The goto and labels of concurrency
@hiredman isn't this a common pattern for core async where you might have multiple go-loops?
No
That is in no way equivalent to running the same callable over and over
How would you model concurrency or workers reading from a queue and then writing state out somewhere, e.g. Database?
Not an in memory queue that is.
It depends on the queue implementation, but usually it is better to have a single thread(sometimes for limiting work in progress, sometimes for doing blocking up, just lots of reasons this usually ends up better) pulling items from the queue and then running a handler or whatever per item
Basically the pattern as writing a socket server
Single threaded generator pushing into a thread pool, you mean?
You have a loop accepting connections and hand connections off to workers
Yes
And the workers are not invoking the same callable over and over
Right, yeah. Makes sense. So you only need one go-loop. Although I guess core async doesn't provide much in the way of rate limiting push to consumers like a thread pool would.
I'm not using core async, so just observing the parallels.
The workers might be core async loops
I use core.async a lot
I haven't used in a couple years. But I've seen the pattern of starting multiple go loops to be consumers as a sort of pool of workers which then had complex cancel channels managed across all of them with pub sub and such. Difficult stuff.
@hiredman if I had plenty of network cards and cores, would you still advise against multiple queue readers?
It really depends, my point is just none of those cases map to "invoking the same callable over and over"
Actually the closest thing it maps to is the lowest level behavior of an executor
E.g. each thread an executor is managing is conceptually running the same code over and over in a loop: pull a runnable from the executors queue and run it
so like, writing an executor on top of an executor
Yeah. Exactly. Although that's still a single producer really.
the "gotos and labels" of concurrency.
code compiles to gotos and labels, but we write function calls, concurrency happens on threadpool threads running a loop, but you try to write higher level stuff
Could you have a memory mapped backed map?
That would offload the memory to disk
It be cool actually if there was one that implemented all the Clojure Map interfaces
there are disk back implementations of java.util.Map, the tricky thing about clojure maps is they are immutable, so you never have a single map on disk, you have a forest, and then you need to manage that
datomic is sort of that, and the way it manages the forest of trees is by exposing it as history
That's neat. Wouldn't you be able to just MMAP the backing trie ?
Or you mean you'd need some sort of GC for it?
it depends
you would need to manage it some way, which might look like a gc
but at this point you are kind of halfway to a database with mvcc like postgresql
halfway is overly generous, but it presents a lot of the same issues as mvcc
I also wonder, what about a hybrid, where the trie is kept in memory, but the leafs are MMAPed?
what you want is a block cache
which of course, the os already has one, but you might want more
I think datomic caches both "blocks" of raw storage and deserialized objects
I've never particularly had this use case, but I can imagine someone say who'd want to load up like a large amount of data in some map to do some report on it or whatever, and if it doesn't fit, but somehow they need it all or something of that sort. But then again maybe there's just a way to get Java to put its whole HEAP on disk
just use derby
I have done this, basically reinventing swap by spilling data into derby when it is too large for processing in memory, it is ok, this was a batch system so the performance was likely terrible, but no one was waiting for the results in realtime
Ya, but there's something nice about a change that wouldn't require any code change. You know, like say you started and it would fit in memory, and suddenly you try to process an even larger file. Instead of like rewriting things to adapt to using derby or some other thing.
just start off using the in memory derby storage š
Fair fair, still think it would be a cool little project though, even if I don't need it lol
https://github.com/Factual/durable-queue and https://github.com/Factual/riffle are things vaguely in this area
Cool, I'll give them a look
Is it possible to declare custom metadata on defn
directly?
user=> (defn foo ^{:custom "Custom metadata!"} [] 'foo!)
#'user/foo
user=> (:custom (meta #'foo))
nil
i.e. for this to return "Custom metadata!"
insteadthe place to it is on the name
a type hint is the only thing that can go on the arg vector
(and the type hint can go on either the name or the arg vector)
what do you mean by that?
not really, you can put any metadata on the arg vector, but it's not reflected on the var, it's reflected on the arglists key
user=> (defn foo ^:bar [])
#'user/foo
user=> (-> #'foo meta :arglists first meta)
{:bar true}
but as @hiredman says, if oyu want metadata on the var, hint the var name
^
means put the follow metadata on the next thing
so ^{:custom "whatever"} []
means put that metadata map on that vector
in this case the vector you are attaching the metadata to is the arglist vector for the function
oh that makes sense
(defn ^:foo bar [] ,,,)
right before that you have the name you are defing, any metadata you attach to that symbol will be copied to the defined var
Thanks, I thought I'd tried that but probably just did it wrong š
working now!