What is break even point for bb when it becomes too slow etc. and I have to switch to plain Clojure?
@dennisa That's hard to say in general, but as a rule of thumb I would say, scripts that take longer than 5 seconds are probably worth running on the JVM. Are you hitting any limits?
@borkdude Btw, I was thinking about this. I think you discovered that a Graalvm process can be replaced with another process, right? Could this mean that you could start a babashka process together with a clj process and have the clj process take over after X time? This assumes of course that there are no port collisions or other side effects
Correct
"after X time", this would just be an explicit call. you could e.g. do the CLI parsing in bb and then hand over control to the clj process
Nice ๐
Can imagine it is even nice for user feedback. E.g. prepend a spinner to a slow starting jvm process https://github.com/clj-commons/spinner
@borkdude I've got a lot of map/filter code that runs per se fast enough ~ . But printing becomes super slow after a while, especially in repl, it takes 500 ms for one line. And printing is important for my scripting.
When you say "in the repl", are you working from Emacs by any chance? If you print a lot, it does tend to become laggy... clearing the REPL output helps in this case (usually)
I work with VSC. The repl printing becomes so slow I have to restart the Repl, and VSC at some point. I am afraid it will happen in CLI/prod and become a bottleneck
@dennisa Is it possible for you to make a minimal repro for this? I may have an idea where a performance problem with println could come from and I might be able to optimize it
It may just be an issue with your editor, so I'd like to have some kind of editor-independent repro
Like, how many lines / items are you printing
I need to find a way to separate the business logic from the printing code. Do you have ideas how to do it?
as a first step, you could try to run your scripts outside of the editor and see if your problem is editor related
@dennisa
printing in VSC (you probably mean vs code + Calva?) does indeed become slower and slower with more lines in output.calva-repl
"file".
But in this case you still have to pay the startup cost of the jvm, itโs not like itโs starting up โin the backgroundโ, since itโs an exec call.
Is there documentation anywhere that compares bash scripting to bb? Would like to point coworkers to this to make it easier for them to try bb. If not, was thinking of starting a wiki page
Can you maybe mitigate (in editor) by shrinking some backlog setting?
btw @dennisa in case it's really the issue with calva and its slow appending to output.calva-repl
, I've already tried doing some optimizations in Calva in the past. You may check this archive: https://clojurians-log.clojureverse.org/calva/2021-03-30
Basically i just added batching into append
function inside results-doc.ts
. It made quite a big difference, especially when you append many lines in one-by-one fashion.
Here's a youtube video https://www.youtube.com/watch?v=GufgU7C4n6s showing the slowness and how it might be optimized. Unfortunately I didn't have time back then to fully finish this effort. If this slowness is what you're experiencing, then we should probably continue the conversation in #calva channel instead.
Nice that babashka has exec on its roadmap
I can have some usages for it probably
@cldwalker The wiki is open I believe.
There is also a github discussion about this. I'm also willing to incorporate this in the book at some point, but I'd be fine if someone else took initiative on this as well or maintained some page
@thiagokokada it's honestly not so hard to add, I'm just more worried that people use it in a way to shoot themselves in the foot
e.g. when using this with tasks, the tasks aren't supervised anymore, e.g. when one dependency uses exec, the entire tree of tasks will suddenly become that process
It is still useful, Python has this on its stdlib and when you need exec is the only option
right
so what's something you would use this for as opposed to just create a child process and wait for it to finish?
I needed to call a second program once where I didn't want to pay the memory consumption of my own program, so exec was the answer (also, I didn't need the result of the program, just calling it)
So I used exec
but the memory consumption of bb is very little
Still, I didn't need it
Like I said, I didn't need the result of the program
Created https://github.com/babashka/babashka/wiki/Tasks:-Bash-and-Babashka-equivalents as a first pass. Happy to move to the book at some point. Fixes and more contributions welcome ๐
Renamed to https://github.com/babashka/babashka/wiki/Bash-and-Babashka-equivalents
Also, I needed to return the real code of the exec'd code
And I can do this with subprocess, but exec does this without needing special handling
correct. do you think this function belongs in babashka.process?
Not sure, in python it is part of os
os.exec
(I would argue that technically not, because exec is not a subprocess)
exec
is a Unix system call, so it is better fitted to a place that groups system calls
babashka.core? ;P
babashka.system?
we could do babashka.os
babashka.os
seems great
I remembered the discussion about setenv now ๐
correct
we didn't add setenv because it would be very confusing since the env is cached in the jvm
BTW, I think exec may compose badly with other parts of Babashka Like, you can't set an environment ๐
yes
this is why I'm not eager to add it yet
there may be reasons the java folks don't support this
exec in Java doesn't really make sense
If exec was possible in JVM you would exit JVM
so? if exec is possible in bb you would exit bb. same for python. what's the difference?
JVM needs to do cleanup, exec is like a kill -9
This would probably broke something
(Not saying that this doesn't in Python, it is just that Python programs generally have a good behavior on kill -9)
But maybe it is just that Java folks doesn't want to be too much coupled with Unix too
Both setenv and exec are kinda of Unix specific (environments exists in Windows but their behavior are different)
if Java needs clean up, how is this different for bb?
I just found that native programs doesn't need that much cleanup as a VM as big as Java
But this is just an assumption, maybe my second reasoning about Unix specific calls makes more sense
I will leave the issue open to collect more info
Anyway, I still find it bizarre that getenv is cached in Java
This seems wrong for some reason for me
It is not like getenv is slow
maybe getting the entire environment map is slow
Maybe it is slow in some specific *nix?
And this is why it is cached?
> maybe getting the entire environment map is slow Yeah, this is the part that doesn't make sense for me AFAIK, getenv in Linux is fast
don't know, who is the developer from 96 to ask this?
The only thing I can think is like, getenv being slow in Solaris or HP-UX or whatever
@cldwalker Good start! Perhaps explain what shell
is since not all people might be familiar with bb.edn
's tasks setup. The shell
function comes from babashka.tasks
which is based on babashka.process/process
BTW, I found a bug report about this issue of System.getenv
: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8173654
Interesting that it is marked as fixed (but we either hit another issue or something else, since this issue was resolved in 2017)
we didn't set the env var via jni
Yep, but the issue is similar that System.getenv
is returning the old value (and the explanation is what the developer from GraalVM said)
Here is a more complete history of the issue: https://bugs.openjdk.java.net/browse/JDK-8173654
Good reading BTW
> But independent of that, the caching of the environment on first use (and its immutability except when creating a subprocess) was a deliberate design decision back in the 5.0 days. So no JDK bug here. OTOH ... I don't think the caching behavior was ever specified, and it might be useful to users to know the rules.
hmm weird that we came across it in graalvm then, if this is supposed to be fixed
It is still caching though, it is very clear by the issue discussion
can you summarize it for me? I'm doing other stuff meanwhile :)
So I think they just fixed whatever part of the code that broke this specially for JNI
(adding higher order function arity linting to clj-kondo)
oh so perhaps they could also fix it for the graalvm specific interop
btw, I think changing the dir may have pretty weird side effects on relative classpaths
> can you summarize it for me? I'm doing other stuff meanwhile ๐
Sure:
- Like you said, this issue was a regression with calling setenv in JNI. Used to work before JDK 8u60, stopped working after this version
- Martin Buchholz says that there is cache-on-first-access for System.getenv
(actually, this seems to be from a code from ProcessBuilder
that System.getenv
reuses)
- The cache is a explicitly design decision, however it is not documented
- Also, changing environments using JNI is unsupported and may crash the JVM (I think this is highly unlikely unless you change some environment variable that JVM itself uses, but well)
- The issue is fixed without explanation, and I can just assume they fixed whatever caused the regression in 8u60, but this wasn't the expected behavior anyway since the cache is a explicit design decision
> oh so perhaps they could also fix it for the graalvm specific interop Maybe it would be a good idea to open a similar issue in GraalVM issuer tracker and see what the GraalVM devs think
ok, so all in all, if you try to implement this, you're pretty much on your own
Pretty much
@thiagokokada what you could hack is an intermediate C-style function that will call setenv and then exec, to get around the setenv problem
I mean, I think our original setenv would work for exec, for passing through env variables?
that's an assumption
but you could offer this in a combined API, like exec + env vars
and expose it only there
Or maybe having babashka.os.setenv
documented with "if you use this function and expect it to call in the current process, please use babashka.os.getenv
instead of System/getenv
but you could also hack a bash script the sets envs and then does exec
and exec to that bash script from bb
> I mean, I think our original setenv would work for exec, for passing through env variables?
Yeah, it should work, the only issue I see with setenv
is with System/getenv
that we could workaround with a wrapper around getenv
from C
yeah. there was also the Windows incompatibility with setenv/getenv, the Windows c lib calls this differently
all in all, it's a bit of a pain to maintain
Maybe we can see how Python implements this :thinking_face: ?
yeah, please do look it up
Yeah, kinda a pain to maintain (lots of code):
- This is the easy part, POSIX: https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/posixmodule.c#L10944-L10963
- And this is the ugly part, Win32: https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/posixmodule.c#L10856-L10908
But basically it is a bunch of #if
But in the end, as far I understand the code, you either call setenv()
or _wputenv()
so even the public API is different in Python?
No, they offer the same public API
os_putenv_impl
is the actual implementation, where on POSIX it compiles to call setenv()
, while on Windows it compiles to _wputenv()
Since setenv()
uses 3-arity, it pass setenv(env_var, value, 1)
, while on Windows they do _wputenv(env_var + "=" + "value")
AFAIK
yep, when I saw that I was like: eeeeeh, I'm having second thoughts
(The call to _wputenv()
ends doing a bunch of validation because of this concat, this is why the code is so big)
TL;DR: Win32 sucks ๐คท
I can give it a try if you want @borkdude, I mean, I am probably the only person interested on this right now ๐
Since you already did the hardwork figuring out how to call C code in setenv
branch, I think now it is mostly writing C+Java code
No promises though, but should be a nice weekend project
OK, I merged the master branch into the set-env
branch. No promises that if you make it work, that I will merge the branch, but feel free to try it :)
Yeah, please review the code and take your own conclusions I mean, it is a pretty niche case
I want to ask if this is a well-known thing before opening an issue/continuing a discussion; I've done a cursory search through the issues and the book... running "one-liners" (passing forms on the command line without -e) on Windows will throw if the form contains "illegal" path characters, e.g. bb "(zero? 1)"
will throw because of the '?'
@highpressurecarsalesm This may just be a shell-specific thing? Which shell is this, powershell or cmd.exe?
@highpressurecarsalesm Ah yes, I see the issue
For now just use explicit -e
bb -e "(some? 1)"
true
but I think it's good to fix
I'll open an issue so it's sort of 'written down' if that's ok, and then go from there
yep, I like that approach
thanks!
Would it be possible to create a babashka pod from a clojure library that depends on Java libraries, for instance javax.xml.stream
?
absolutely, as long as the libraries are compatible with graalvm native-image (if you want to create a pod with fast startup)
It's likely you are right guys. I had 11K lines in the calva output and once removed the performance is back up. : ) will give it a try and get back