babashka

https://github.com/babashka/babashka. Also see #sci, #nbb and #babashka-circleci-builds .
dabrazhe 2021-06-16T06:45:58.059600Z

What is break even point for bb when it becomes too slow etc. and I have to switch to plain Clojure?

borkdude 2021-06-16T07:03:04.060300Z

@dennisa That's hard to say in general, but as a rule of thumb I would say, scripts that take longer than 5 seconds are probably worth running on the JVM. Are you hitting any limits?

2021-06-16T07:19:11.062100Z

@borkdude Btw, I was thinking about this. I think you discovered that a Graalvm process can be replaced with another process, right? Could this mean that you could start a babashka process together with a clj process and have the clj process take over after X time? This assumes of course that there are no port collisions or other side effects

borkdude 2021-06-16T07:46:59.062200Z

Correct

borkdude 2021-06-16T07:47:54.062400Z

https://github.com/babashka/babashka/issues/858

1๐Ÿคฏ
borkdude 2021-06-16T07:49:06.062800Z

"after X time", this would just be an explicit call. you could e.g. do the CLI parsing in bb and then hand over control to the clj process

2021-06-16T07:56:42.063100Z

Nice ๐Ÿ™‚

2021-06-16T08:05:44.063300Z

Can imagine it is even nice for user feedback. E.g. prepend a spinner to a slow starting jvm process https://github.com/clj-commons/spinner

dabrazhe 2021-06-16T09:58:29.066100Z

@borkdude I've got a lot of map/filter code that runs per se fast enough ~ . But printing becomes super slow after a while, especially in repl, it takes 500 ms for one line. And printing is important for my scripting.

2021-06-16T10:39:31.066300Z

When you say "in the repl", are you working from Emacs by any chance? If you print a lot, it does tend to become laggy... clearing the REPL output helps in this case (usually)

dabrazhe 2021-06-16T10:47:49.066500Z

I work with VSC. The repl printing becomes so slow I have to restart the Repl, and VSC at some point. I am afraid it will happen in CLI/prod and become a bottleneck

borkdude 2021-06-16T10:52:15.066800Z

@dennisa Is it possible for you to make a minimal repro for this? I may have an idea where a performance problem with println could come from and I might be able to optimize it

borkdude 2021-06-16T11:06:41.067Z

It may just be an issue with your editor, so I'd like to have some kind of editor-independent repro

borkdude 2021-06-16T11:06:51.067200Z

Like, how many lines / items are you printing

dabrazhe 2021-06-16T12:15:44.067400Z

I need to find a way to separate the business logic from the printing code. Do you have ideas how to do it?

borkdude 2021-06-16T12:16:30.067600Z

as a first step, you could try to run your scripts outside of the editor and see if your problem is editor related

Tomas Brejla 2021-06-16T12:23:09.067800Z

@dennisa printing in VSC (you probably mean vs code + Calva?) does indeed become slower and slower with more lines in output.calva-repl "file".

grazfather 2021-06-16T12:49:50.069900Z

But in this case you still have to pay the startup cost of the jvm, itโ€™s not like itโ€™s starting up โ€˜in the backgroundโ€™, since itโ€™s an exec call.

2021-06-16T12:50:32.070600Z

Is there documentation anywhere that compares bash scripting to bb? Would like to point coworkers to this to make it easier for them to try bb. If not, was thinking of starting a wiki page

grazfather 2021-06-16T12:50:47.070700Z

Can you maybe mitigate (in editor) by shrinking some backlog setting?

Tomas Brejla 2021-06-16T13:41:54.070900Z

btw @dennisa in case it's really the issue with calva and its slow appending to output.calva-repl, I've already tried doing some optimizations in Calva in the past. You may check this archive: https://clojurians-log.clojureverse.org/calva/2021-03-30 Basically i just added batching into append function inside results-doc.ts. It made quite a big difference, especially when you append many lines in one-by-one fashion. Here's a youtube video https://www.youtube.com/watch?v=GufgU7C4n6s showing the slowness and how it might be optimized. Unfortunately I didn't have time back then to fully finish this effort. If this slowness is what you're experiencing, then we should probably continue the conversation in #calva channel instead.

kokada 2021-06-16T13:43:20.071400Z

Nice that babashka has exec on its roadmap

kokada 2021-06-16T13:43:27.071600Z

I can have some usages for it probably

borkdude 2021-06-16T13:50:03.071900Z

@cldwalker The wiki is open I believe.

borkdude 2021-06-16T13:51:11.072800Z

There is also a github discussion about this. I'm also willing to incorporate this in the book at some point, but I'd be fine if someone else took initiative on this as well or maintained some page

borkdude 2021-06-16T13:55:46.072900Z

@thiagokokada it's honestly not so hard to add, I'm just more worried that people use it in a way to shoot themselves in the foot

borkdude 2021-06-16T13:56:21.073100Z

e.g. when using this with tasks, the tasks aren't supervised anymore, e.g. when one dependency uses exec, the entire tree of tasks will suddenly become that process

kokada 2021-06-16T13:58:28.073300Z

It is still useful, Python has this on its stdlib and when you need exec is the only option

borkdude 2021-06-16T14:06:06.073500Z

right

borkdude 2021-06-16T14:12:22.073700Z

so what's something you would use this for as opposed to just create a child process and wait for it to finish?

kokada 2021-06-16T14:17:54.074700Z

I needed to call a second program once where I didn't want to pay the memory consumption of my own program, so exec was the answer (also, I didn't need the result of the program, just calling it)

kokada 2021-06-16T14:17:58.074900Z

So I used exec

borkdude 2021-06-16T14:19:41.075600Z

but the memory consumption of bb is very little

kokada 2021-06-16T14:20:21.075800Z

Still, I didn't need it

kokada 2021-06-16T14:20:43.076Z

Like I said, I didn't need the result of the program

2021-06-16T14:21:04.076500Z

Created https://github.com/babashka/babashka/wiki/Tasks:-Bash-and-Babashka-equivalents as a first pass. Happy to move to the book at some point. Fixes and more contributions welcome ๐Ÿ™‚

kokada 2021-06-16T14:21:22.076600Z

Also, I needed to return the real code of the exec'd code

kokada 2021-06-16T14:21:42.076800Z

And I can do this with subprocess, but exec does this without needing special handling

borkdude 2021-06-16T14:21:58.077Z

correct. do you think this function belongs in babashka.process?

kokada 2021-06-16T14:22:14.077200Z

Not sure, in python it is part of os

kokada 2021-06-16T14:22:20.077500Z

os.exec

kokada 2021-06-16T14:22:49.077700Z

(I would argue that technically not, because exec is not a subprocess)

kokada 2021-06-16T14:23:29.077900Z

exec is a Unix system call, so it is better fitted to a place that groups system calls

borkdude 2021-06-16T14:25:59.078100Z

babashka.core? ;P

borkdude 2021-06-16T14:26:08.078300Z

babashka.system?

borkdude 2021-06-16T14:26:16.078500Z

we could do babashka.os

kokada 2021-06-16T14:26:51.078700Z

babashka.os seems great

kokada 2021-06-16T14:27:07.078900Z

I remembered the discussion about setenv now ๐Ÿ™‚

borkdude 2021-06-16T14:27:51.079100Z

correct

borkdude 2021-06-16T14:28:08.079400Z

we didn't add setenv because it would be very confusing since the env is cached in the jvm

kokada 2021-06-16T14:28:11.079600Z

BTW, I think exec may compose badly with other parts of Babashka Like, you can't set an environment ๐Ÿ˜…

borkdude 2021-06-16T14:28:20.079800Z

yes

borkdude 2021-06-16T14:28:29.080Z

this is why I'm not eager to add it yet

borkdude 2021-06-16T14:28:45.080200Z

there may be reasons the java folks don't support this

kokada 2021-06-16T14:29:05.080400Z

exec in Java doesn't really make sense

kokada 2021-06-16T14:29:15.080600Z

If exec was possible in JVM you would exit JVM

borkdude 2021-06-16T14:29:48.080800Z

so? if exec is possible in bb you would exit bb. same for python. what's the difference?

kokada 2021-06-16T14:30:06.081Z

JVM needs to do cleanup, exec is like a kill -9

kokada 2021-06-16T14:30:14.081200Z

This would probably broke something

kokada 2021-06-16T14:30:47.081400Z

(Not saying that this doesn't in Python, it is just that Python programs generally have a good behavior on kill -9)

kokada 2021-06-16T14:31:33.081600Z

But maybe it is just that Java folks doesn't want to be too much coupled with Unix too

kokada 2021-06-16T14:32:02.081800Z

Both setenv and exec are kinda of Unix specific (environments exists in Windows but their behavior are different)

borkdude 2021-06-16T14:32:05.082Z

if Java needs clean up, how is this different for bb?

kokada 2021-06-16T14:33:00.082200Z

I just found that native programs doesn't need that much cleanup as a VM as big as Java

kokada 2021-06-16T14:33:29.082400Z

But this is just an assumption, maybe my second reasoning about Unix specific calls makes more sense

borkdude 2021-06-16T14:33:48.082600Z

I will leave the issue open to collect more info

kokada 2021-06-16T14:34:30.082800Z

Anyway, I still find it bizarre that getenv is cached in Java

kokada 2021-06-16T14:34:57.083Z

This seems wrong for some reason for me

kokada 2021-06-16T14:35:29.083200Z

It is not like getenv is slow

borkdude 2021-06-16T14:35:44.083400Z

maybe getting the entire environment map is slow

kokada 2021-06-16T14:35:46.083600Z

Maybe it is slow in some specific *nix?

kokada 2021-06-16T14:35:52.083800Z

And this is why it is cached?

kokada 2021-06-16T14:36:13.084Z

> maybe getting the entire environment map is slow Yeah, this is the part that doesn't make sense for me AFAIK, getenv in Linux is fast

borkdude 2021-06-16T14:36:15.084200Z

don't know, who is the developer from 96 to ask this?

1๐Ÿ˜†
kokada 2021-06-16T14:36:50.084500Z

The only thing I can think is like, getenv being slow in Solaris or HP-UX or whatever

borkdude 2021-06-16T14:46:41.085800Z

@cldwalker Good start! Perhaps explain what shell is since not all people might be familiar with bb.edn's tasks setup. The shell function comes from babashka.tasks which is based on babashka.process/process

1๐Ÿ‘
kokada 2021-06-16T14:57:18.085900Z

BTW, I found a bug report about this issue of System.getenv: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8173654 Interesting that it is marked as fixed (but we either hit another issue or something else, since this issue was resolved in 2017)

borkdude 2021-06-16T14:59:00.086100Z

we didn't set the env var via jni

kokada 2021-06-16T14:59:53.086300Z

Yep, but the issue is similar that System.getenv is returning the old value (and the explanation is what the developer from GraalVM said)

kokada 2021-06-16T15:00:01.086500Z

Here is a more complete history of the issue: https://bugs.openjdk.java.net/browse/JDK-8173654

kokada 2021-06-16T15:00:03.086700Z

Good reading BTW

kokada 2021-06-16T15:00:36.087Z

> But independent of that, the caching of the environment on first use (and its immutability except when creating a subprocess) was a deliberate design decision back in the 5.0 days. So no JDK bug here. OTOH ... I don't think the caching behavior was ever specified, and it might be useful to users to know the rules.

borkdude 2021-06-16T15:01:07.087200Z

hmm weird that we came across it in graalvm then, if this is supposed to be fixed

kokada 2021-06-16T15:01:40.087400Z

It is still caching though, it is very clear by the issue discussion

borkdude 2021-06-16T15:01:54.087600Z

can you summarize it for me? I'm doing other stuff meanwhile :)

kokada 2021-06-16T15:02:03.087800Z

So I think they just fixed whatever part of the code that broke this specially for JNI

borkdude 2021-06-16T15:02:06.088Z

(adding higher order function arity linting to clj-kondo)

borkdude 2021-06-16T15:02:26.088200Z

oh so perhaps they could also fix it for the graalvm specific interop

borkdude 2021-06-16T15:02:43.088400Z

btw, I think changing the dir may have pretty weird side effects on relative classpaths

kokada 2021-06-16T15:07:00.088600Z

> can you summarize it for me? I'm doing other stuff meanwhile ๐Ÿ™‚ Sure: - Like you said, this issue was a regression with calling setenv in JNI. Used to work before JDK 8u60, stopped working after this version - Martin Buchholz says that there is cache-on-first-access for System.getenv (actually, this seems to be from a code from ProcessBuilder that System.getenv reuses) - The cache is a explicitly design decision, however it is not documented - Also, changing environments using JNI is unsupported and may crash the JVM (I think this is highly unlikely unless you change some environment variable that JVM itself uses, but well) - The issue is fixed without explanation, and I can just assume they fixed whatever caused the regression in 8u60, but this wasn't the expected behavior anyway since the cache is a explicit design decision

kokada 2021-06-16T15:07:55.088800Z

> oh so perhaps they could also fix it for the graalvm specific interop Maybe it would be a good idea to open a similar issue in GraalVM issuer tracker and see what the GraalVM devs think

borkdude 2021-06-16T15:07:56.089Z

ok, so all in all, if you try to implement this, you're pretty much on your own

kokada 2021-06-16T15:08:10.089200Z

Pretty much

borkdude 2021-06-16T15:10:14.089400Z

@thiagokokada what you could hack is an intermediate C-style function that will call setenv and then exec, to get around the setenv problem

borkdude 2021-06-16T15:10:42.089600Z

I mean, I think our original setenv would work for exec, for passing through env variables?

borkdude 2021-06-16T15:11:07.089800Z

that's an assumption

borkdude 2021-06-16T15:11:17.090Z

but you could offer this in a combined API, like exec + env vars

borkdude 2021-06-16T15:11:24.090200Z

and expose it only there

kokada 2021-06-16T15:11:47.090400Z

Or maybe having babashka.os.setenv documented with "if you use this function and expect it to call in the current process, please use babashka.os.getenv instead of System/getenv

borkdude 2021-06-16T15:12:00.090700Z

but you could also hack a bash script the sets envs and then does exec and exec to that bash script from bb

kokada 2021-06-16T15:12:37.090900Z

> I mean, I think our original setenv would work for exec, for passing through env variables? Yeah, it should work, the only issue I see with setenv is with System/getenv that we could workaround with a wrapper around getenv from C

borkdude 2021-06-16T15:12:50.091200Z

yeah. there was also the Windows incompatibility with setenv/getenv, the Windows c lib calls this differently

borkdude 2021-06-16T15:13:00.091400Z

all in all, it's a bit of a pain to maintain

kokada 2021-06-16T15:13:19.091600Z

Maybe we can see how Python implements this :thinking_face: ?

borkdude 2021-06-16T15:13:35.091800Z

yeah, please do look it up

kokada 2021-06-16T15:27:15.092Z

Yeah, kinda a pain to maintain (lots of code): - This is the easy part, POSIX: https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/posixmodule.c#L10944-L10963 - And this is the ugly part, Win32: https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Modules/posixmodule.c#L10856-L10908 But basically it is a bunch of #if

kokada 2021-06-16T15:28:37.092300Z

But in the end, as far I understand the code, you either call setenv() or _wputenv()

borkdude 2021-06-16T15:29:40.092600Z

so even the public API is different in Python?

kokada 2021-06-16T15:30:07.092800Z

No, they offer the same public API

kokada 2021-06-16T15:30:50.093Z

os_putenv_impl is the actual implementation, where on POSIX it compiles to call setenv(), while on Windows it compiles to _wputenv()

kokada 2021-06-16T15:31:51.093200Z

Since setenv() uses 3-arity, it pass setenv(env_var, value, 1), while on Windows they do _wputenv(env_var + "=" + "value") AFAIK

borkdude 2021-06-16T15:31:54.093400Z

yep, when I saw that I was like: eeeeeh, I'm having second thoughts

kokada 2021-06-16T15:32:44.093800Z

(The call to _wputenv() ends doing a bunch of validation because of this concat, this is why the code is so big)

kokada 2021-06-16T15:33:34.094Z

TL;DR: Win32 sucks ๐Ÿคท

kokada 2021-06-16T15:36:26.094200Z

I can give it a try if you want @borkdude, I mean, I am probably the only person interested on this right now ๐Ÿ˜† Since you already did the hardwork figuring out how to call C code in setenv branch, I think now it is mostly writing C+Java code

kokada 2021-06-16T15:36:35.094400Z

No promises though, but should be a nice weekend project

borkdude 2021-06-16T15:38:32.094800Z

OK, I merged the master branch into the set-env branch. No promises that if you make it work, that I will merge the branch, but feel free to try it :)

kokada 2021-06-16T15:39:51.095100Z

Yeah, please review the code and take your own conclusions I mean, it is a pretty niche case

Bob B 2021-06-16T17:54:52.099400Z

I want to ask if this is a well-known thing before opening an issue/continuing a discussion; I've done a cursory search through the issues and the book... running "one-liners" (passing forms on the command line without -e) on Windows will throw if the form contains "illegal" path characters, e.g. bb "(zero? 1)" will throw because of the '?'

borkdude 2021-06-16T18:11:01.099800Z

@highpressurecarsalesm This may just be a shell-specific thing? Which shell is this, powershell or cmd.exe?

borkdude 2021-06-16T18:17:05.100Z

@highpressurecarsalesm Ah yes, I see the issue

borkdude 2021-06-16T18:17:46.100200Z

For now just use explicit -e

bb -e "(some? 1)"
true

borkdude 2021-06-16T18:18:01.100400Z

but I think it's good to fix

Bob B 2021-06-16T18:18:50.100600Z

I'll open an issue so it's sort of 'written down' if that's ok, and then go from there

borkdude 2021-06-16T18:20:48.100800Z

yep, I like that approach

borkdude 2021-06-16T18:20:51.101Z

thanks!

2021-06-16T18:55:27.101800Z

Would it be possible to create a babashka pod from a clojure library that depends on Java libraries, for instance javax.xml.stream?

borkdude 2021-06-16T18:55:51.102Z

absolutely, as long as the libraries are compatible with graalvm native-image (if you want to create a pod with fast startup)

dabrazhe 2021-06-16T21:01:05.102300Z

It's likely you are right guys. I had 11K lines in the calva output and once removed the performance is back up. : ) will give it a try and get back