Please upvote or join this discussion: https://github.com/oracle/graal/discussions/3476
Wow. Thanks for getting in there for the rest of us.
Can anyone explain some of the finer details here?
I know java static initializers are code blocks that execute at class instantiation; so Iām assuming for native image that --initialize-at-build-time
just means those blocks are executed when the native image is compiled (hence for example any side effects etc there will happen at compile time not runtime).
What I donāt fully understand is how the clojure compiler uses static initializers, and exactly what the implications of this are for clojure.
Iām assuming itās because clojureās a lisp, and even aotād clojure code still has to apply effects at runtime. e.g. ns initialisation presumably runs as a static initializer etc.
so presumably thatās why this impacts all clojure code
I've tried several times to build without this option. It's educational, but I can't really put words to this to explain it in detail, but due to how Clojure is set up / compiles things, it's needed.
I would say: try without the option and perhaps come up with a better explanation, I think that would be very useful.
so build a clojure hello world, with and without this option? And inspect the clojure generated class files with javap
to see if we can explain it?
yes
Clojure emits these kinds of things for functions:
static {
const__0 = RT.var("clojure.core", "println");
}
which could be related to this
yeah Iāve seen those beforeā¦ presumably static linking changes those too?
Try to build a graalvm hello world program without the option and see how far you get
I've done such a process here: https://github.com/oracle/graal/issues/3251#issuecomment-842305171
If you don't initialize at build time then you will get:
Caused by: java.io.FileNotFoundException: Could not locate clojure/core__init.class, clojure/core.clj or clojure/core.cljc on classpath.
at clojure.lang.RT.load(RT.java:462)
which might be coming from here: https://github.com/clojure/clojure/blob/b1b88dd25373a86e41310a525a21b497799dbbf2/src/jvm/clojure/lang/RT.java#L338
this can be "fixed" by including clojure.core (init.class) on in the resources
interesting
Firstly itās unsurprising that the line in RT is in a static initializer š
but that uses the dynamicclassloader, etc, which won't work in a native-image anymore anyway
so I think that already explains why you need build time
so we can quit here already
yeah ā was going to say something similar though hopefully you can clear it up for me, if Iām inaccurate/wrongā¦ My reasoning was essentially: 1. clojure is a single pass compilerā¦ i.e. essentially your whole program is a flattened require tree / repl sessionā¦ all deps āessentially concatenatedā. 2. therefore if weāre loading clojure/core there, at some point after that weāll be loading all of your apps dependencies in a similar initializer block.
3. hence we can quit here
Though I guess the dynamic class loader essentially just implements what I said
i.e. resolving clj / class files, and compiling clj into .class etcā¦ essentially controlling āRead (compile) Evalā
yeah. another way to put it: you can't "dynamically load classes" at runtime in GraalVM native-image, but Clojure does this in static initializer blocks, hence these must be initalized at build time.
Yeah thatās a good way to put it
Perhaps this could be resolved if you make a Java class which does all the loading in a static initializer block and you only initialize that one at build time and the rest of your classes could be inialized at runtime, but this would probably require changes to Clojure itself
But interestingly these changes can be accomplished using substitutions as well perhaps.
Here is an example: https://github.com/borkdude/clj-reflector-graal-java11-fix#the-solution
I was wondering why the clojure compiler needs to use the dynamic class loader for AOTād code? Presumably it could (at least in a graal compilation context) avoid that? Or is that essentially what youāre describing?
yeah, that's what I was trying to describe
in an AOT-ed (native-image) setting you know which namespaces you want, so you could just write that out explicitly
Yeah
Backing up for a second to the graal issue: Re thomaswueās point: > Specifying the option for all classes in a specific jar file seems quite reasonable. Would it be OK for this to only work in such broad manner if an uber jar is created first or is that too limiting? Why do we need to bundle into an uberjar? Could we not also just give them a classpath?
yes, you can already do this, but the original problem in that topic is that they want to get rid of the option without explicitly specifying the classes for which you want build time initialization
and here he offers some kind of compromise to be able to say for which .jar you want it. so if you provide an uberjar you will get all the classes again
Feel free to follow up the discussion
Yeahā¦ Iām just trying to understand the tooing and froing of conversation.
So to summarise the thread / they donāt want people to mark every class for build-time-initialization; because for some possible classes itāll screw things up.
For most idiomatic clojure code we need build-time-initialization. Though for some clojure code that also wonāt work (e.g. an ns with (def data (fetch-data-from-postgres ,,,))
will need to either be rewritten or opt in to runtime initialisation).
If we default a classpath into build-time-initialization we may build invalid ābuild timeā state into the runtime (adinnās point) for java deps etc.
Is that the general gist of it?
correct. but imo taking away this option will make it harder on Clojure developers since for most CLIs I built this stuff worked fine (or I was able to work around it). Occasionally a library like httpkit would give problems: https://github.com/http-kit/http-kit#native-image
But perhaps listing all clojure-related namespaces through some script is possible, I was just trying to make sure Clojure projects would still be able to run
Yeah I agree with that. The default for .clj(c) files should be build time, because of the nature for clojure
@borkdude: Yeah I was literally typing: presumably we could use something like mranderson to move all clj code under a new top level package/ns, and then flag that to default as build time
blegh, I don't like that solution. I like the uberjar solution much better
mranderson is a hack to make multiple versions of the same library work together
how do you avoid mixing java library / classes into the uberjar though?
@rickmoynihan well you don't have to, you could of course just make a jar with your project code + clojure libs and put the Java code into another jar
(I agree the mranderson thing would be a hack)
but personally I would just go with all build at runtime for everything and figure out the exceptions
the tools I build are usually CLI tools and not huge micronaut web server things which I think the issue is more concerned about
Perhaps we can figure out a good pattern to build only clojure classes at build-time
> but personally I would just go with all build at runtime for everything and figure out the exceptions well to be fair that is how any approach in clj will eventually end up working ā the main difference would be starting from a point where you didnāt picking the wrong default for java libs.
@borkdude: Yeah I was going to say the issue is that thereās no tooling that knows what a clojure lib is vs a java lib.
Weād need something that knew how to biject clj files into their class filesā¦. essentially mapping munge
over the .clj(c)
classpath.
well, that is certainly doable
indeed
but I was trying to avoid getting into this, it all works beautifully now
yeah
it would be nice to avoid having to have another step
btw, I'm trying these flags:
"--initialize-at-build-time=clojure."
"--initialize-at-build-time=clojure.core.server"
but I'm still getting errors about clojure.core.server(trying this in refl)
with graal 22?
no, 21
ok, for refl this seems to work:
"--initialize-at-build-time=clojure,refl"
but it's a small project without any dependencies
yeah thatās essentially equivalent to listing all of the top level namespaces you use there.
itās good to prove what theyāre suggesting will work for usā¦ itās just a shame itās more clunky.
I will try with the httpkit library
unfortunately there the clojure and java package overlaps
:/
@rickmoynihan yeah, so this works with httpkit (2.5.3):
"--initialize-at-build-time=clojure,refl,org.httpkit"
"--initialize-at-run-time=org.httpkit.client"
which doesn't buy you anything really
since you still have to make explicit because of the overlapping package name
but at least, it seems doable, but annoying
I might try for babashka, which is a way bigger project
later this week
it seems a namespace refl.main
makes a package refl
and a class main
inside of it
so you have to use the package name refl
to get all the related classes refl.main__init
, etc.
so perhaps a "simple" all-ns
with some munging/post-processing could be all that's needed
@rickmoynihan Something like this:
user=> (->> (map ns-name (all-ns)) (remove #(str/starts-with? % "clojure")) (map #(str/split (str %) #"\.")) (keep butlast) (map #(str/join "." %)) distinct (map munge) (cons "clojure"))
("clojure" "refl" "org.httpkit")
which is what I used for refl + httpkit
for babashka:
("clojure" "sci.impl" "selmer" "babashka.nrepl" "babashka.impl.clojure.java" "babashka.impl" "rewrite_clj.node" "bencode" "rewrite_clj.parser" "babashka.impl.clojure" "org.httpkit" "rewrite_clj.custom_zipper" "rewrite_clj.zip" "borkdude.graal" "babashka.nrepl.impl" "babashka.pods" "cognitect" "babashka" "edamame.impl" "cheshire" "rewrite_clj" "hiccup" "sci" "borkdude" "flatland.ordered" "babashka.pods.impl" "clj_yaml" "babashka.impl.clojure.core" "datascript" "hf.depstar" "babashka.impl.tools" "sci.addons" "babashka.impl.clojure.test")
(could probably clean this up by looking at the existence of a prefix in others)
but you get the gist
https://github.com/babashka/babashka/commit/207e22a6fa04184e609f2aa5af73a382efddc19a
ok, that leads to:
Exception raised in scope ForkJoinPool-2-worker-25.ClosedWorldAnalysis.AnalysisGraphBuilderPhase: org.graalvm.compiler.java.BytecodeParser$BytecodeParserError: com.oracle.graal.pointsto.constraints.UnsupportedFeatureException: No instances of com.fasterxml.jackson.core.io.SerializedString are allowed in the image heap as this class should be initialized at image runtime. To see how this object got instantiated use --trace-object-instantiation=com.fasterxml.jackson.core.io.SerializedString.
kind of demonstrating that it would be painful to have to do this exercise for every graalvm project
This jackson thing seems to be the only problem though
so here's what I ended up with:
"--initialize-at-build-time=clojure,sci.impl,selmer,babashka.nrepl,babashka.impl.clojure.java,babashka.impl,rewrite_clj.node,bencode,rewrite_clj.parser,babashka.impl.clojure,org.httpkit,rewrite_clj.custom_zipper,rewrite_clj.zip,borkdude.graal,babashka.nrepl.impl,babashka.pods,cognitect,babashka,edamame.impl,cheshire,rewrite_clj,hiccup,sci,borkdude,flatland.ordered,babashka.pods.impl,clj_yaml,babashka.impl.clojure.core,datascript,hf.depstar,babashka.impl.tools,sci.addons,babashka.impl.clojure.test"
"--initialize-at-build-time=com.fasterxml.jackson"
so it seems it's feasible
Sorry was afk for lunch š > unfortunately there the clojure and java package overlaps What do you mean? Clojure and java code inhabiting the same package/ns? Meaning the java classes are defaulted into build time init?
yes, for org.httpkit for example
Iām guessing for babashka you just ran that at a repl and pasted the output into the shell script; but would plan to automate it at somepoint (or convince the graal folk to do something different)
yes
@rickmoynihan are you on linux btw?
macos
ok. in #babashka-circleci-builds there are new binaries compiled on the init-at-build-time branch. I wonder if this would impact startup time
I don't see a real difference on macos yet
> if this would impact startup time In which direction were you thinking?
perhaps it's slower if more work has to be done at run time?
Shouldnāt we be expecting for essentially the same coverage? i.e. all clojure code (except the few exceptions) to be initialised at build time?
yes
perhaps when you're doing interop it's going to be different
but perhaps it's not really significant
so it's good to have a working solution now and be prepared for 22
yeah assuming both builds behave the same wrt to correctness, Iād expect there not to be a significant difference in startup timeā¦ If there were itād probably mean we werenāt covering everything we needed to.
Do you think any of this changes how the graal thread has been left? > Specifying the option for all classes in a specific jar file seems quite reasonable. Would it be OK for this to only work in such broad manner if an uber jar is created first or is that too limiting?
I already responded in that thread
He seems to be in favor of that
ah thanks ā just refreshed
š
What are the use cases for the uberjar case thomaswue is pushing for? Iām not even sure for clojure itās sufficient
I usually tend to compile and collect all the code into an uberjar first and then feed that to graalvm
you don't have to do this, but I find this easier, since you just know what code you're dealing with after the uberjar step
also I distribute the uberjars so people who want to make nixos derivations etc can use them
I could also say in case of an issue to a graalvm dev: here you have the uberjar, I do this to compile it, but it doesn't work
without him/her having to install clojure, etc
Yeah I get that itās useful for your other requirements (you want uberjars anyway etc). But an uberjar is just a reified/flattened classpathā¦ so why canāt they just take a classpath?
You should ask this to Thomas, I don't know his reasoning
I should probably ask them š
jinx
His reasoning could be:
Just want to check that Iām not arguing against what you want š
Libraries aren't allowed to say: everything at build time
but if you have a fat jar, you're not a library owner saying this, you are the end user
yeah ok
that makes sense
(actually I was meaning to ask you about this for another reasonā¦ Iāll start another thread on the channel for it though as itās a change of topic)
I think this also relates to clojure startup time. Perhaps we attempt a clojure-side compilation flag that solves (or makes progress towards) both issues at AOT time?
meaning when this flag is in effect the clojure compiler generates different byte code and this byte code both starts up faster and works with graal native without needing --initialize-at-build-time.
@chris441 do you have any concrete ideas of what can be done differently?
Not without more careful consideration I do not.
what Clojure does in static initializers is resolve vars, load classes, etc.
delaying the class loading to run time won't work in a native image
I will look a lot more closely; I just know those are two related things and my profilers always show var initialization as one of the startup issues so somehow compiling that data down into something perhaps more concrete that loads faster is an interesting issue that seems related.
Also interesting for dalvik.
I know this is an area smart people have looked at before.
delaying var initialization to build time will make native images slower to start up right?
I don't want to delay anything, I want AOT to produce data as a side effect that can be quickly loaded to initialize vars during runtime initialization.
ok, but now these are are already initialized in the image heap, so that work has already been done when starting the image
Exact opposite of delaying.
My point is: moving work from build to run time makes things slower
Well, for example in your javap above:
const__0 = RT.var("clojure.core", "println");
Yes, I agree and that is not what I am suggesting. const_0
being initialized to a static class instance in your example above would make things faster as it would bypass the RT.var mechanism.
right
it could directly reference the AOT-ed class which represents the println var right?
Yes, in this case. You also have the case where something is initialized via a complex function that produces a persistent datastructure and in this case the data can be saved in resources and found via a hashtable lookup or straight array lookup in constant time eliding the generating code. I haven't looked at this in huge depth but for instance I was extremely careful with dtype-next and it still takes some time even after an AOT run to pull in, for instance, the ND system via require. This is a solvable problem.
My thought is more of the form move --initialize-at-build-time into the clojure compiler and allow anything that it did in the graal vm system to be done during the AOT step. Then --init-at-build-time should be a noop if done during graal vm compilation.
As clojure.core is AOT-ed by default anyway, I guess Compiler could be instrumented in such a way that it can reference these classes directly when generating more code. For core vars only it would already be a win
This is complicated by the fact that bytecode files aren't general data storage mechanisms (at least as far as I know) which means you need some level of sidecar file generated at build time for pure data.
The Compiler could keep track of what vars map to which classes
It would be nasty and error prone. Definitely a YMMV pathway but with time it could work well.
If it opened up both simpler Graal native and more dalvik development that would IMO be a very solid win worth real invesment.
Well, I guess if nubank agrees it may be worth real invesment š.
What problems does Dalvik currently have with Clojure?
I'm seeing that Dalvik is now replaced with something else
maybe just a detail. Does ART (Android Runtime) interpret Java bytecode directly?
I am referring to this article:
Interesting article, thanks for sharing
@chris441 https://clojureverse.org/t/do-clojure-still-have-rooms-to-improve-at-compiler-level/7802/12?u=borkdude