clojure-dev

Issues: https://clojure.atlassian.net/browse/CLJ | Guide: https://insideclojure.org/2015/05/01/contributing-clojure/
Ben Sless 2020-10-21T14:28:08.085700Z

What's the reason Keyword doesn't implement IObj and can't carry metadata?

favila 2020-10-21T14:30:25.085800Z

at the very least, you can’t intern them directly

bronsa 2020-10-21T14:33:34.086100Z

keywords are interned

alexmiller 2020-10-21T14:35:09.086400Z

they're cached and reused so metadata in one place would affect other uses

👍 1
2020-10-21T14:56:28.087600Z

When saying they are interned, or they are cached, is that the same thing as saying: "we want equality to be fast, determined by whether references/pointers to them are equal", or is there any more nuance to it than that?

alexmiller 2020-10-21T14:57:44.088Z

fast equality (identity) checks and memory impact

borkdude 2020-10-21T14:58:36.088600Z

it also means they won't ever be garbage collected, so creating a gazillion different keywords in a long running process isn't a good idea maybe?

borkdude 2020-10-21T14:58:56.089100Z

that's not the typical clojure program of course

favila 2020-10-21T14:59:34.089500Z

The clj implementation uses weakrefs to intern, which can be GCed

borkdude 2020-10-21T14:59:43.089700Z

ah, very good

favila 2020-10-21T15:00:00.090Z

but the map holding them I guess could still get very big

alexmiller 2020-10-21T15:01:52.091800Z

under gc pressure, they are gc'ed

👍 1
favila 2020-10-21T15:02:52.092400Z

just as a btw, cljs will “intern” (make a fixed map of) keywords discovered statically during compilation, but keywords created dynamically are not because JS has no facility to intern them without leaking. So cljs doesn’t guarantee equal keywords are identical.

💯 1
borkdude 2020-10-21T15:03:12.092700Z

that's an interesting detail, thank you

➕ 1
favila 2020-10-21T15:03:43.093200Z

it has an extra predicate keyword-identical? to get some of the performance back

borkdude 2020-10-21T15:04:46.094400Z

ah that's where that comes from. yeah. in .cljc I usually write (defn kw-identical? [k v] #?(:clj (identical? k v) :cljs (keyword-identical? k v)))

2020-10-21T15:07:21.095400Z

I guess one could make the Symbol -> Reference to Keyword map smaller under GC pressure, too, but that would require some kind of sweep over that map to remove entries from it, and some kind of trigger to call that sweep?

borkdude 2020-10-21T15:08:51.096100Z

hmm, so the corresponding symbols still won't get GC-ed... what's the point then of these weak refs?

2020-10-21T15:09:47.096600Z

Hmmm, perhaps Clojure already does make that map smaller, e.g. in its method clearCache ...

borkdude 2020-10-21T15:10:29.096800Z

aha!

2020-10-21T15:10:59.097500Z

with the trigger to call it for the Symbol->Reference to Keyword map being an intern call on a keyword that finds a null reference

borkdude 2020-10-21T15:13:42.097900Z

so for each new (non-identical) keyword that trigger is called?

2020-10-21T15:16:02.099400Z

The trigger appears to be calling intern on a Symbol that already has an entry in the map, but its weak reference has been made null, which is evidence that an earlier GC run freed the Keyword object. If you never do an intern call on such a Symbol, there will be no call to clearCache that I can see: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Keyword.java#L34-L37

borkdude 2020-10-21T15:17:07.100800Z

right, that makes a lot of sense

2020-10-21T15:17:12.101Z

One could imagine trying to call clearCache in other situations, but it appears to be a computation time vs. space tradeoff, e.g. having a time-based periodic trigger to call ClearCache would be a waste of CPU time in most situations.

2020-10-21T15:19:03.102600Z

I suppose the "perfect" trigger would be some thread whose only job was to be an an infinite loop doing a blocking remove() call on the reference queue of GC'ed WeakReferences, and call clearCache every time that remove() returned something. Then you'd need synchronization on that map.

2020-10-21T15:19:40.103Z

errr, or maybe the ConcurrentHashMap is safe for that use already

favila 2020-10-21T15:20:07.103600Z

doesn’t this actually only clearcache when there’s a keyword miss?

2020-10-21T15:20:22.104Z

That is what it appears to me, yes.

borkdude 2020-10-21T15:21:29.104400Z

hmm

favila 2020-10-21T15:26:13.106900Z

well, nm, it runs intern again after removing the entry, and that ends up clearing out the table and reference queue

2020-10-21T15:28:47.109500Z

What line(s) of code are you referring to when you say "it runs intern again after removing the entry"?

favila 2020-10-21T15:29:09.109800Z

if(existingk != null)
		return existingk;
	//entry died in the interim, do over
	table.remove(sym, existingRef);
	return intern(sym);

favila 2020-10-21T15:30:04.110900Z

existingk=null means a cached item was GCed (the table has an empty weakref in it), so it removes the entry and runs intern again; this time table.get(sym) will be null, so it runs clearCache

2020-10-21T15:32:08.113100Z

Suppose some Keywords were GC'ed, and every call you made to intern after that were for Symbol's that had Keyword's that were never GC'ed. It appears to me that clearCache would never be called. Do you see a way that clearCache could be called in that scenario?

favila 2020-10-21T15:33:20.113600Z

no, but that would also not be a case with memory pressure

2020-10-21T15:34:06.114400Z

It could be. When I said "some Keywords were GC'ed", it could be 99% of a billion of them that were GC'ed.

favila 2020-10-21T15:35:01.115400Z

eventually the GC will collect a keyword that was only not in use for a brief time and gets used again; at that point it will be detected

favila 2020-10-21T15:35:22.115700Z

if the GC never bothers to do that, then it has spare memory

favila 2020-10-21T15:36:13.117200Z

the worst case I can think of would be 1) create a large number of unique keywords 2) then stop and never use a new keyword again

2020-10-21T15:36:19.117400Z

The scenario I am describing is one where the keywords that are GC'ed, are never used again by the application. I know there are applications that will not behave that way, but that is the scenario I was trying to describe above.

favila 2020-10-21T15:38:10.119100Z

correct, but if there’s memory pressure from those empty entries, eventually it seems that some keyword somewhere that the application still uses will eventually be GCed, if the application is still using keywords at all that don’t have permanent lifetimes

2020-10-21T15:38:39.119700Z

Ah, I see what you mean. clearCache is called either if you re-intern an old GC'ed Keyword, OR if you intern a brand new keyword. The only situation where you avoid it indefinitely is by sticking with the Keywords you have now, forever.

favila 2020-10-21T15:39:24.120400Z

not only sticking with them, but keeping them “alive”

2020-10-21T15:39:30.120600Z

right

favila 2020-10-21T15:39:40.120900Z

so, it’s possible but seems very unlikely

favila 2020-10-21T15:40:06.121600Z

doesn’t mean you won’t have a bad time when clearCache gets called though

2020-10-21T15:40:55.122100Z

It is a kind of stop-the-world GC on that map, yes.

2020-10-21T15:41:20.122600Z

or at least stop-the-thread-calling-intern, not the world

favila 2020-10-21T15:44:46.123Z

I think the lesson here is don’t dynamically create large numbers of unique short-lived keywords

cgrand 2020-10-21T15:51:39.124900Z

So don’t create keywords on user input (form, json, edn, xml… parsing).

favila 2020-10-21T15:56:01.125500Z

edn billion laughs attack

😀 1
borkdude 2020-10-21T16:01:03.126400Z

It would be fun to get some statistics on the amount of interned keywords in various clojure apps

alexmiller 2020-10-21T16:41:53.126700Z

I have done that in the past

alexmiller 2020-10-21T16:42:22.127300Z

I also have a benchmark (based on some real world cases) that is importing json with a lot of unique keywords to push this case

alexmiller 2020-10-21T16:43:06.128Z

the last set of changes done were specifically to make that case better (used to be kind of slow to hit the gc conditions)

alexmiller 2020-10-21T16:44:06.128500Z

this is the (in)famous aphyr ticket (https://clojure.atlassian.net/browse/CLJ-1439)