clojure

New to Clojure? Try the #beginners channel. Official docs: https://clojure.org/ Searchable message archives: https://clojurians-log.clojureverse.org/
seancorfield 2020-12-11T00:20:17.359100Z

@ryan.is.gray We've historically used s/conformer at work and even published a library of "conforming specs" that are useful for validating (and lightly transforming) form data -- from strings to numbers, booleans, etc. But we're looking at switching to https://github.com/wilkerlucio/spec-coerce which derives coercions from specs so that you can write your specs for just the target data you want and then run your string input through spec-coerce and then into Spec itself, keeping the two concerns separated.

scottbale 2020-12-11T01:30:57.359300Z

> data meat grinder That's so metal! \m/ >.< \m/

šŸ¤˜ 1
2020-12-11T01:46:40.359600Z

my scenario is that, I donā€™t know what functions a namespace provides. I use the repl for hints.

2020-12-11T01:50:31.359800Z

right, in my code I didn't need to know either

2020-12-11T01:50:43.360Z

it iterates the namespace and creates the completions

2020-12-11T01:51:45.360200Z

but readline (because of the way it is used) is limited to analyzing what the repl prints to you and what you type in, for anything smarter I think you need an integrated tool

2020-12-11T01:52:34.360400Z

(for your use case, instead of just iterating the namespace, it could also iterate the ns-aliases)

mpenet 2020-12-11T04:07:05.364800Z

I strongly suggest using coax instead @seancorfield. It's more complete/battle tested. It's also way more performant

mpenet 2020-12-11T04:07:49.365500Z

(it started as a fork of spec-coerce)

seancorfield 2020-12-11T04:24:40.368Z

Ah, good to know @mpenet -- I'll look into it.

seancorfield 2020-12-11T04:26:14.368700Z

Oh, the exoscale library? I have looked at it before so thanks for reminding me!

rdgd 2020-12-11T05:34:28.369Z

thanks @seancorfield šŸ‘€

roklenarcic 2020-12-11T10:59:54.371900Z

Iā€™ve got a problem: I have a protocol in a library dependency. I have a defrecord that extends the protocol. When I run (satisfies? Protocol record) it says false, which it really shouldnā€™t. But when I reload the namespace then suddenly it does extend the protocol.

p-himik 2020-12-11T11:18:37.372Z

> when I reload the namespace The namespace that defines the protocol? Or the one that defines the record?

roklenarcic 2020-12-11T11:42:48.372200Z

the one that defines the record

roklenarcic 2020-12-11T11:43:32.372400Z

I know that reloading the namespace that defined the protocol breaks existing implementations, but in this case I start at this state

roklenarcic 2020-12-11T11:47:03.372600Z

I start the REPL and ask satisfies? it will return false, even though the protocol implementation is in defrecord definition

roklenarcic 2020-12-11T11:47:10.372800Z

that is before doing any reloads whatsoever

p-himik 2020-12-11T11:47:43.373Z

Something in your workflow is probably implicitly reloading the protocol namespace. I can't really say anything else without an MRE.

roklenarcic 2020-12-11T11:48:39.373200Z

why would it be implicitly reloaded? The require directives donā€™t load already loaded namespaces, right?

p-himik 2020-12-11T11:52:31.373400Z

You said you were using some library that defines that protocol. Do you know for sure that that library never reloads anything? Also, did you try to reproduce it using just clj as your REPL, with nothing else?

roklenarcic 2020-12-11T11:58:41.373600Z

I can try that now

roklenarcic 2020-12-11T12:23:20.373800Z

I swapped a couple of branches and did a clean and now the error is goneā€¦

roklenarcic 2020-12-11T12:23:29.374Z

weirdest thing

jumar 2020-12-11T13:56:08.375800Z

https://clojure.org/reference/refs says Clojure refs implement snapshot isolation. Is somewhere stated what kind of concurrency issues this isolation level (specifically in Clojure code) can cause? Does anyone experienced such issues in practice?

alexmiller 2020-12-11T14:24:01.379600Z

The concurrency issue you are most likely to see with refs is write skew (because read-only refs are not part of the default ref set that can cause a transaction to retry). But thatā€™s easily worked around when itā€™s an issue by using ensure instead of deref to add the ref to the ref set even on read.

šŸ‘ 1
lread 2020-12-11T15:03:27.379700Z

Pulling at straws here @roklenarcic, but a dissoc on a record field returns a map. Any chance you did a dissoc on your record?

Clojure 1.10.1
user=&gt; (defprotocol MyProtocol (my-fn [this a]))
MyProtocol
user=&gt; (defrecord MyRecord [my-field] MyProtocol (my-fn [this a] a))
user.MyRecord
user=&gt; (def r (-&gt;MyRecord "field-value"))
#'user/r
user=&gt; (satisfies? MyProtocol r)
true
user=&gt; (my-fn r 42)
42
user=&gt; (def r2 (dissoc r :my-field))
#'user/r2
user=&gt; (satisfies? MyProtocol r2)
false
user=&gt; (my-fn r2 42)
Execution error (IllegalArgumentException) at user/eval145$fn$G (REPL:1).
No implementation of method: :my-fn of protocol: #'user/MyProtocol found for class: clojure.lang.PersistentArrayMap
user=&gt; (type r2)
clojure.lang.PersistentArrayMap
user=&gt; (type r)
user.MyRecord

roklenarcic 2020-12-11T15:04:05.379900Z

no, but thatā€™s a good trick, never thought of thatā€¦

lilactown 2020-12-11T16:44:42.381800Z

Is there anything written down about the decision to use an explicit ā€œensureā€ rather than track all deref? For my curiosity

alexmiller 2020-12-11T16:53:39.382Z

adding read-only refs to the ref set means your transactions have a greater chance of failure and retry. but it's not always necessary. so the current setup gives you the option and a way to choose your semantics. if they were always included, you would have no way to weaken that constraint when needed.

alexmiller 2020-12-11T16:56:47.382400Z

like say you had two refs - one for an account balance and one for transaction fee amount. you have a transaction that updates the balance and assesses the fee (which only needs to be read). if the fee changes infrequently and the exact moment when a change starts being applied is not important to the business, it's fine to just deref the fee ref. but if it's really important that the fee change takes effect immediately, you could ensure that ref

dominicm 2020-12-11T17:38:05.382800Z

What kind of maps are struct maps good for reducing the size of? My (limited )experiments so far show arraymaps to be consistently smaller.

2020-12-11T17:39:10.383400Z

I thought those were deprecated

dominicm 2020-12-11T17:39:31.384300Z

@noisesmith Nope. Just better served by records, https://clojure.org/reference/data_structures#StructMaps

2020-12-11T17:39:37.384600Z

I think struct-maps were basically a first pass experiment that lead to defrecords

2020-12-11T17:39:55.385200Z

I would not really expect them to better than anything at anything

dominicm 2020-12-11T17:40:19.385700Z

Unfortunately records don't support namespaced keys or any other kind of key for that matter.

2020-12-11T17:40:57.386700Z

If you are looking into implementation level details of why certain data structures use the amount of memory that they do, and want something that can draw pictures of JVM objects and references between them for you, you might enjoy tinkering with the cljol library: https://github.com/jafingerhut/cljol

2020-12-11T17:42:44.389700Z

I have not used it to investigate struct maps before, and haven't had an occasion to delve into struct map implementation. array maps are good for memory utilization, for sure, but they do have O(n) lookup time, so something to keep in mind if you ever want to make a big one (that and as soon as you take a large array map and create an updated version of it with operations like assoc, etc., you will typically get back a hash map)

2020-12-11T17:42:51.389900Z

defrecords do support other kinds of keys and namespaced keys, they just don't get turned into object fields

dominicm 2020-12-11T17:43:04.390Z

@hiredman but doesn't that negate optimizations?

dominicm 2020-12-11T17:43:36.390100Z

@hiredman In fact, it adds an extra 8 bytes of overhead! :p

2020-12-11T17:43:59.390400Z

user=&gt; (defrecord Foo [])
user.Foo
user=&gt; (-&gt;Foo)
#user.Foo{}
user=&gt; (assoc (-&gt;Foo) ::a 1)
#user.Foo{:user/a 1}
user=&gt;

2020-12-11T17:44:23.390900Z

not supporting as well as you would like is not the same thing as not supporting at all

dominicm 2020-12-11T17:44:54.391Z

@hiredman Sure. But there's no size optimization to be had by using one that way.

dominicm 2020-12-11T17:45:35.391100Z

user=&gt; (mm/measure (assoc (-&gt;X) ::a 1))
"264 B"
user=&gt; (mm/measure (assoc {} ::a 1))
"232 B"

2020-12-11T17:54:25.394200Z

I would be surprised if you found a built-in Clojure data structure for maps that is lower memory than array-map, and also supported qualified keywords as keys. But I haven't done the measurements you are doing -- just giving guess from knowledge I do have. array-maps are O(n) lookup time, as I mentioned above, and 'fragile', as mentioned above. Note that keywords are interned, i.e. only stored in memory once, so the mm/measure results you are showing probably contain all of the objects for the keyword once, but if you did a similar measurement for 1000 objects that use the same keyword repeatedly, the keyword memory is only counted once overall (as it should be, since it is only stored once in memory)

dominicm 2020-12-11T20:18:34.396400Z

Is there a java-y solution for spinning up multiple workers running the same callable over and over or just an infinite length task? I'm imagining a threadpool where you define the size and have a method to interrupt the whole pool.

2020-12-11T20:21:03.397Z

there's probably an Executor that makes this easy, they can own pools

2020-12-11T20:22:19.398100Z

interruption on the jvm is tricky, period, unless you use one of a specific set of predefined interruptable methods, or are OK with checking a sentinel value and shutting down manually at execution boundaries

dominicm 2020-12-11T20:25:15.399200Z

I thought interruption was okay, you check isInterrupted and catch the exception, either happens, you quit?

dominicm 2020-12-11T20:25:38.399700Z

I've missed the executor if it exists :(

2020-12-11T20:28:24.400200Z

I don't think that is good

2020-12-11T20:29:46.402100Z

Like, in general, you want structured concurrency, tree shaped task graphs, forkjoin, etc

2020-12-11T20:30:13.402900Z

What you are asking for is extremely unstructured

2020-12-11T20:31:37.404300Z

It doesn't even have the structure of iteration where previous results feed back in, just the same callable over and over

2020-12-11T20:32:05.405200Z

It basically demands side effects as the only way to have results

2020-12-11T20:32:42.405700Z

The goto and labels of concurrency

dominicm 2020-12-11T20:37:10.406500Z

@hiredman isn't this a common pattern for core async where you might have multiple go-loops?

2020-12-11T20:37:52.406700Z

No

2020-12-11T20:38:38.408700Z

That is in no way equivalent to running the same callable over and over

dominicm 2020-12-11T20:38:49.408900Z

How would you model concurrency or workers reading from a queue and then writing state out somewhere, e.g. Database?

dominicm 2020-12-11T20:39:09.409300Z

Not an in memory queue that is.

2020-12-11T20:41:51.412400Z

It depends on the queue implementation, but usually it is better to have a single thread(sometimes for limiting work in progress, sometimes for doing blocking up, just lots of reasons this usually ends up better) pulling items from the queue and then running a handler or whatever per item

2020-12-11T20:42:06.412900Z

Basically the pattern as writing a socket server

dominicm 2020-12-11T20:42:23.414Z

Single threaded generator pushing into a thread pool, you mean?

2020-12-11T20:42:34.414400Z

You have a loop accepting connections and hand connections off to workers

2020-12-11T20:42:36.414600Z

Yes

2020-12-11T20:43:05.416200Z

And the workers are not invoking the same callable over and over

dominicm 2020-12-11T20:43:33.416900Z

Right, yeah. Makes sense. So you only need one go-loop. Although I guess core async doesn't provide much in the way of rate limiting push to consumers like a thread pool would.

dominicm 2020-12-11T20:43:54.417700Z

I'm not using core async, so just observing the parallels.

2020-12-11T20:43:59.417900Z

The workers might be core async loops

2020-12-11T20:44:07.418200Z

I use core.async a lot

dominicm 2020-12-11T20:48:32.421100Z

I haven't used in a couple years. But I've seen the pattern of starting multiple go loops to be consumers as a sort of pool of workers which then had complex cancel channels managed across all of them with pub sub and such. Difficult stuff.

dominicm 2020-12-11T20:49:33.422400Z

@hiredman if I had plenty of network cards and cores, would you still advise against multiple queue readers?

2020-12-11T20:50:55.424500Z

It really depends, my point is just none of those cases map to "invoking the same callable over and over"

2020-12-11T20:52:15.426400Z

Actually the closest thing it maps to is the lowest level behavior of an executor

2020-12-11T20:53:28.428500Z

E.g. each thread an executor is managing is conceptually running the same code over and over in a loop: pull a runnable from the executors queue and run it

2020-12-11T21:07:52.428900Z

so like, writing an executor on top of an executor

dominicm 2020-12-11T21:10:59.431Z

Yeah. Exactly. Although that's still a single producer really.

2020-12-11T21:11:13.431100Z

the "gotos and labels" of concurrency.

2020-12-11T21:12:34.432400Z

code compiles to gotos and labels, but we write function calls, concurrency happens on threadpool threads running a loop, but you try to write higher level stuff

2020-12-11T21:14:50.432900Z

Could you have a memory mapped backed map?

2020-12-11T21:15:03.433400Z

That would offload the memory to disk

2020-12-11T21:15:39.434100Z

It be cool actually if there was one that implemented all the Clojure Map interfaces

2020-12-11T21:17:07.435200Z

there are disk back implementations of java.util.Map, the tricky thing about clojure maps is they are immutable, so you never have a single map on disk, you have a forest, and then you need to manage that

2020-12-11T21:18:10.436400Z

datomic is sort of that, and the way it manages the forest of trees is by exposing it as history

2020-12-11T21:20:03.436600Z

http://www.mapdb.org/

2020-12-11T21:27:59.437Z

That's neat. Wouldn't you be able to just MMAP the backing trie ?

2020-12-11T21:28:26.437200Z

Or you mean you'd need some sort of GC for it?

2020-12-11T21:30:21.437400Z

it depends

2020-12-11T21:30:36.437600Z

you would need to manage it some way, which might look like a gc

2020-12-11T21:31:46.437800Z

but at this point you are kind of halfway to a database with mvcc like postgresql

2020-12-11T21:32:53.438Z

halfway is overly generous, but it presents a lot of the same issues as mvcc

2020-12-11T21:41:02.438200Z

I also wonder, what about a hybrid, where the trie is kept in memory, but the leafs are MMAPed?

2020-12-11T21:51:23.438400Z

what you want is a block cache

2020-12-11T21:51:47.438600Z

which of course, the os already has one, but you might want more

2020-12-11T21:52:18.438800Z

I think datomic caches both "blocks" of raw storage and deserialized objects

2020-12-11T21:54:04.439Z

I've never particularly had this use case, but I can imagine someone say who'd want to load up like a large amount of data in some map to do some report on it or whatever, and if it doesn't fit, but somehow they need it all or something of that sort. But then again maybe there's just a way to get Java to put its whole HEAP on disk

2020-12-11T21:55:11.439200Z

just use derby

2020-12-11T21:56:32.439400Z

I have done this, basically reinventing swap by spilling data into derby when it is too large for processing in memory, it is ok, this was a batch system so the performance was likely terrible, but no one was waiting for the results in realtime

2020-12-11T21:56:41.439600Z

Ya, but there's something nice about a change that wouldn't require any code change. You know, like say you started and it would fit in memory, and suddenly you try to process an even larger file. Instead of like rewriting things to adapt to using derby or some other thing.

2020-12-11T21:57:07.439800Z

just start off using the in memory derby storage šŸ™‚

2020-12-11T21:57:35.440Z

Fair fair, still think it would be a cool little project though, even if I don't need it lol

2020-12-11T22:01:33.440200Z

https://github.com/Factual/durable-queue and https://github.com/Factual/riffle are things vaguely in this area

2020-12-11T22:03:32.440600Z

Cool, I'll give them a look

coby 2020-12-11T22:07:58.442100Z

Is it possible to declare custom metadata on defn directly?

user=&gt; (defn foo ^{:custom "Custom metadata!"} [] 'foo!)
#'user/foo
user=&gt; (:custom (meta #'foo))
nil
i.e. for this to return "Custom metadata!" instead

2020-12-11T22:10:47.442300Z

the place to it is on the name

2020-12-11T22:11:15.442800Z

a type hint is the only thing that can go on the arg vector

2020-12-11T22:12:06.443400Z

(and the type hint can go on either the name or the arg vector)

coby 2020-12-11T22:12:39.443700Z

what do you mean by that?

bronsa 2020-12-11T22:13:01.444200Z

not really, you can put any metadata on the arg vector, but it's not reflected on the var, it's reflected on the arglists key

user=&gt; (defn foo ^:bar [])
#'user/foo
user=&gt; (-&gt; #'foo meta :arglists first meta)
{:bar true}

bronsa 2020-12-11T22:13:47.445100Z

but as @hiredman says, if oyu want metadata on the var, hint the var name

2020-12-11T22:13:48.445200Z

^ means put the follow metadata on the next thing

2020-12-11T22:14:18.445900Z

so ^{:custom "whatever"} [] means put that metadata map on that vector

2020-12-11T22:14:35.446600Z

in this case the vector you are attaching the metadata to is the arglist vector for the function

coby 2020-12-11T22:14:41.447Z

oh that makes sense

lilactown 2020-12-11T22:14:50.447600Z

(defn ^:foo bar [] ,,,)

2020-12-11T22:15:10.448300Z

right before that you have the name you are defing, any metadata you attach to that symbol will be copied to the defined var

coby 2020-12-11T22:17:07.448800Z

Thanks, I thought I'd tried that but probably just did it wrong šŸ™‚

coby 2020-12-11T22:18:07.449100Z

working now!