babashka

https://github.com/babashka/babashka. Also see #sci, #nbb and #babashka-circleci-builds .
Ben Sless 2021-05-09T06:57:29.095500Z

Am I doing something wrong here or is there a bug?

bb repl
Babashka v0.4.0 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.

user=> (require '[babashka.process :refer [process sh]])
nil
user=&gt; (sh ["youtube-dl" "--dump-json" "--ignore-errors" "--skip-download" "<https://www.youtube.com/user/Misophistful>"])
Segmentation fault (core dumped)
Even more minimal exaxmple:
bb repl
Babashka v0.4.0 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.

user=&gt; (require '[babashka.process :refer [process sh]])
nil
user=&gt; (sh ["youtube-dl"])
Segmentation fault (core dumped)

borkdude 2021-05-09T07:18:45.096900Z

@ben.sless are you using the static Linux binary perhaps?

borkdude 2021-05-09T07:19:47.097600Z

If thatโ€™s the case, try the normal one

Ben Sless 2021-05-09T07:19:48.097800Z

yes

Ben Sless 2021-05-09T07:19:53.098Z

will do

borkdude 2021-05-09T07:20:23.098500Z

^ @thiagokokada @rahul080327

Ben Sless 2021-05-09T07:21:15.098900Z

works

๐Ÿ‘ 1
lispyclouds 2021-05-09T07:22:44.099300Z

Ive linked you to the thread where we are discussing the issues

lispyclouds 2021-05-09T07:56:28.106100Z

@ben.sless what OS and kernel versions are you using? I will try to dig a bit deep

Ben Sless 2021-05-09T08:03:16.106300Z

@rahul080327

OS: Pop 20.10 groovy
Kernel: x86_64 Linux 5.11.0-7614-generic

lispyclouds 2021-05-09T08:04:24.106500Z

Thanks!

๐Ÿ‘ 1
borkdude 2021-05-09T09:28:43.108Z

Maybe we should enable the musl then? https://github.com/oracle/graal/issues/571#issuecomment-618508346

lispyclouds 2021-05-09T09:37:26.108300Z

ah

By default, --static will statically link your native-image against your host libc (glibc)
https://github.com/oracle/graal/issues/2691#issuecomment-670010352

lispyclouds 2021-05-09T09:37:41.108700Z

this could explain the segfaults

lispyclouds 2021-05-09T09:38:06.108900Z

i would advocate for using musl too

borkdude 2021-05-09T09:38:38.109100Z

ok PR welcome for whoever wants to figure out how to do this :)

lispyclouds 2021-05-09T09:39:14.109300Z

will try this

borkdude 2021-05-09T09:39:39.109500Z

Maybe try with pod-babashka-aws. It just got a static build an hour ago

borkdude 2021-05-09T09:39:44.109700Z

but the tests don't pass there

borkdude 2021-05-09T09:39:54.109900Z

so it would be a good reason to start doing that

kokada 2021-05-09T11:16:11.110100Z

Yeah, this makes sense if we are linking with glibc

kokada 2021-05-09T11:16:58.110300Z

I think even when linking statically, glibc still tries to access some system files on runtime

kokada 2021-05-09T11:17:05.110500Z

Something related to nss

kokada 2021-05-09T11:17:17.110700Z

And some private symbols

kokada 2021-05-09T11:17:36.110900Z

And yeah, a glibc version mismatch can cause segmentation faults

kokada 2021-05-09T11:17:53.111100Z

musl should work much better for static binaries

borkdude 2021-05-09T18:27:19.118Z

@ben.sless Any chance you could test this (musl-linked) static binary for linux with the same program that didn't work for you this morning? https://github.com/babashka/babashka/files/6448129/babashka-0.4.1-SNAPSHOT-linux-amd64-static-musl.tar.gz

๐ŸŽ‰ 1
borkdude 2021-05-10T07:10:39.159900Z

Hmm. The static image was compiled to work with Alpine, that was the only reason it was there. If the musl build has issues that the other static image doesnโ€™t have, Iโ€™m inclined to roll back changes until we sort it out. Cc @rahul080327

borkdude 2021-05-10T07:11:37.160800Z

I wonโ€™t have a build that is flaky, thatโ€™s a constant source of headaches

borkdude 2021-05-10T07:14:15.161700Z

What kind of issue did you see with data generators, also a stackoverflow?

borkdude 2021-05-10T08:09:17.161900Z

I think I found the issue. The static binary stackoverflows much sooner than the non-static one. E.g. try:

./bb -e "(defn foo [x] (prn x) (foo (inc x))) (foo 0)"
The non-static goes to about 20k, while the static one only goes through 186 or so

borkdude 2021-05-10T08:15:14.162100Z

> musl allocates memory of very small size for thread stacks. musl documentation says it's only 80KB (now increased to 128KB) for default, which is rather small compared to 2-10MB of glibc. And that's been said to be the major reason for stack-consuming executable to crash (segmentation fault) in musl libc environment.

borkdude 2021-05-10T08:15:21.162300Z

https://github.com/yaegashi/muslstack

lispyclouds 2021-05-10T08:54:34.162700Z

Very interesting find! My opinion here would be that we could make it explicit that we are using musl and point out the caveats. if this is the reason for flakyness i think we can go ahead with musl with the warning stickers. the ability to run in most places should outweigh the stack size issues. musl has quite a widespread production usage and i think this would be known to people

lispyclouds 2021-05-10T08:56:04.162900Z

We could try patching the binary but again i get shivers when we change the defaults of a well used lib, specially when it comes to memory regions

lispyclouds 2021-05-10T08:58:43.163100Z

things like rust, zig and others solely rely on musl for static links, i would say thats a good reason to stick to it

borkdude 2021-05-10T09:06:36.164300Z

can't we just up the stack size limit?

borkdude 2021-05-10T09:17:37.168900Z

@rahul080327 Patching the binary with muslstack didn't work btw. Can't we set a higher stack limit during compilation?

borkdude 2021-05-10T09:31:45.174100Z

This seems to be a similar bug: https://bugs.webkit.org/show_bug.cgi?id=187485

borkdude 2021-05-10T09:34:19.174300Z

Is there a way to pass stack-size=2097152 to the compilation?

Ben Sless 2021-05-10T10:08:40.174500Z

This option needs to be passed to the native-image compiler?

borkdude 2021-05-10T10:11:09.174700Z

I am trying now with: "-H:CCompilerOption=-Wl,-z,stack-size=2097152" So far it seems to work... more testing now

borkdude 2021-05-10T10:21:53.174900Z

@thiagokokada Perhaps you can also try this option locally. I'm running the build a few times over to see if works repeatedly. https://app.circleci.com/pipelines/github/babashka/babashka?branch=stack-size

borkdude 2021-05-10T10:38:01.175100Z

Doesn't seem to work. The recursion depth is also more limited still with this binary, so maybe it was just luck that it worked a couple of times.

borkdude 2021-05-10T11:15:32.175300Z

Discussing with a GraalVM dev now

kokada 2021-05-10T13:51:09.176300Z

> What kind of issue did you see with data generators, also a stackoverflow? Yeah, but I only saw this issue once. The issue with clj-http-little seems to occur more frequently (and also it does not seem related to stack, since it occurs randomly; or maybe it is, I will try it later with the recent stack changes) > Very interesting find! My opinion here would be that we could make it explicit that we are using musl and point out the caveats. if this is the reason for flakyness i think we can go ahead with musl with the warning stickers. the ability to run in most places should outweigh the stack size issues. musl has quite a widespread production usage and i think this would be known to people I concur with @rahul080327 here.

borkdude 2021-05-10T13:52:15.176500Z

I've done deep into the weeds with a GraalVM dev already. It's related to what musl does with the stack size. We should resolve this before publishing the new static bb, I find this unacceptable.

borkdude 2021-05-10T13:52:33.176700Z

You can find some details here: https://github.com/oracle/graal/issues/3398

kokada 2021-05-10T13:58:40.177100Z

I am building Babashka with "-H:CCompilerOption=-Wl,-z,stack-size=2097152" just to run the tests and see what we get

borkdude 2021-05-10T14:05:46.177500Z

@thiagokokada I already tried this and it doesn't have any effect

borkdude 2021-05-10T14:06:01.177700Z

You can see this when using the java repro from the issue

borkdude 2021-05-10T14:16:51.178300Z

So there is a workaround, it seems.

borkdude 2021-05-10T14:37:22.178500Z

@thiagokokada @rahul080327 This seems to be a workaround that works: https://github.com/babashka/babashka/commit/3bc0f3ffad557bd5e89a1fb1337ce5fc1d8716b5

borkdude 2021-05-10T14:39:52.178700Z

The workaround is to start bb in a thread with a better default for the stack size. The rest of the threads should get the thread-size from the stack-size argument. It does incur some extra startup time so I will make that workaround conditional based on some env var

borkdude 2021-05-10T14:55:03.178900Z

We don't even have to set the stack size in Java, as long as we start everything in a thread and pass a better default for the stack size, things should work again

kokada 2021-05-10T15:12:42.179200Z

> It does incur some extra startup time so I will make that workaround conditional based on some env var So we will have something like BABASHKA_THREAD_STACK_SIZE that can be set by the user if it wants it? This is the ideia? LGTM by the way

borkdude 2021-05-10T15:13:01.179400Z

Let's continue in #babashka-sci-dev

๐Ÿ‘ 1
borkdude 2021-05-10T15:13:06.179600Z

Thread is getting long

Ben Sless 2021-05-09T18:55:22.120400Z

Looks like it works now

kokada 2021-05-09T19:22:52.123400Z

@rahul080327 Seems like musl is really the way to go ๐Ÿ™‚

lispyclouds 2021-05-09T19:23:20.123600Z

:bananadance:

lispyclouds 2021-05-09T19:23:52.123800Z

awesome work @thiagokokada great pairing session!

๐Ÿ‘ 1
kokada 2021-05-09T19:25:02.124200Z

Thanks for you too @rahul080327

1
borkdude 2021-05-09T19:39:29.126300Z

Would it make sense that iff you run the babashka static binary, then you also download static pods?

borkdude 2021-05-09T19:41:35.126500Z

or would it be best to control this via an explicit env var:

BABASHKA_PODS_STATIC=true

lispyclouds 2021-05-09T19:41:50.126700Z

if the pods are guaranteed to be on the same machine, which i think they are, i would say it makes sense to make it the default

lispyclouds 2021-05-09T19:42:08.126900Z

most users may not think of that env var i think?

borkdude 2021-05-09T19:42:48.127100Z

in this case we must capture the fact that babashka has been compiled statically (not sure if graalvm already provides a system property for this that you can read at compile time)

lispyclouds 2021-05-09T19:43:44.127300Z

im not sure of it

borkdude 2021-05-09T19:45:37.127500Z

this is easily done using (def static? (System/getenv "BABASHKA_STATIC")) in the relevant pods library namespace and use this for downloading the right pod

1
borkdude 2021-05-09T20:15:00.127800Z

Meanwhile, @thiagokokada are you going to do a similar PR to babashka for the static one?

borkdude 2021-05-09T20:15:51.128Z

Another question: we can "prefer" a static pod if bb is itself static, but what if there is no static linux pod available, but there is a dynamic one, should we select that one, or just crash with: no such pod here?

kokada 2021-05-09T20:15:51.128200Z

Sorry, didn't understand the question

borkdude 2021-05-09T20:16:16.128400Z

@thiagokokada My impression was that you were going to update bb static itself with musl too?

borkdude 2021-05-09T20:16:27.128600Z

or did I misunderstand?

kokada 2021-05-09T20:17:03.128900Z

I thought we would use the @rahul080327 approach instead of mine, so I was waiting for his PR

borkdude 2021-05-09T20:17:43.129100Z

ok, that PR is merged now. but that was for the aws pod.

kokada 2021-05-09T20:17:51.129300Z

But I can do it too if @rahul080327 is not interested of porting the approach from pods-aws-babashka

lispyclouds 2021-05-09T20:17:52.129500Z

you can go ahead with it @thiagokokada if you want ๐Ÿ™‚

๐Ÿ‘ 1
kokada 2021-05-09T20:18:53.129800Z

About the question about if we should try to download dynamic pods in case of the static one is unavailable: no, I don't think so

kokada 2021-05-09T20:19:08.130Z

Too much magic may make things more confusing

borkdude 2021-05-09T20:19:34.130200Z

maybe automatically selecting a static or dynamic one is also too much magic - that is why I proposed an env var to control this

kokada 2021-05-09T20:19:48.130400Z

Yeah, maybe

kokada 2021-05-09T20:20:02.130600Z

But I don't like to control those kinds of thing as env vars

kokada 2021-05-09T20:20:14.130800Z

Could we have an opts passed to load-pod instead :thinking_face: ?

borkdude 2021-05-09T20:20:36.131Z

That can work, but then you'd have to change the code if you deploy your code from a mac to the cloud for example

kokada 2021-05-09T20:20:52.131200Z

Like:

(pods/load-pod 'some-pod "0.0.0" {:static true})

kokada 2021-05-09T20:21:01.131400Z

Or maybe

kokada 2021-05-09T20:21:23.131600Z

(pods/load-pod 'some-pod "0.0.0" {:linux-opts {:static true}})

kokada 2021-05-09T20:21:39.131900Z

This would make it clear that it only applies to linux (so macOS are unaffected)

kokada 2021-05-09T20:21:58.132100Z

Also has the advantage if someone wants to do something like "static with dynamic fallback" possible

kokada 2021-05-09T20:22:26.132300Z

They just need to capture the exception when calling {:static true} fails

borkdude 2021-05-09T20:39:56.132700Z

It would be much simpler if we could just always use the static pod on linux

borkdude 2021-05-09T20:47:35.132900Z

Maybe let's test this out a bit for a few months and if the static one works out ok, we can just switch over all linux pods to it

borkdude 2021-05-09T21:01:11.133100Z

Made a new release for the pod-babashka-aws pod

kokada 2021-05-09T21:01:49.133300Z

https://github.com/babashka/babashka/pull/828 :reviewplease:

borkdude 2021-05-09T21:03:01.133700Z

why did you make this change? ARG BABASHKA_XMX="-J-Xmx4500m"

kokada 2021-05-09T21:03:18.133900Z

> It would be much simpler if we could just always use the static pod on linux Maybe we could go even further and eventually simple remove the dynamically compiled version of Babashka If someone wants to use Babashka with dynamic linking, they can always depend on their distros packages

borkdude 2021-05-09T21:04:24.134200Z

How would dynamically liked bb magically appear in "their distros packages" if we don't provide it

kokada 2021-05-09T21:05:33.134400Z

For example, NixOS's Babashka package is dynamically linked I was imagining like the distros themselves packaging Babashka

borkdude 2021-05-09T21:06:05.134600Z

I am all for it, the only reason I would not do it if it wouldn't work for someone

kokada 2021-05-09T21:06:48.134900Z

> I am all for it, the only reason I would not do it if it wouldn't work for someone Yeah, I think we can work on this static version now and ask for more people to test, and if everything looks good we could eventually remove the non-static version

borkdude 2021-05-09T21:06:50.135100Z

I can secretly use the static aws pod in the manifest and see if someone starts yelling?

borkdude 2021-05-09T21:07:23.135300Z

Seems like a good starting point since it's still a "young" pod

kokada 2021-05-09T21:07:34.135500Z

> I can secretly use the static aws pod in the manifest and see if someone starts yelling? Well, LGTM, I don't know if someone has some objection against this

borkdude 2021-05-09T21:09:57.135800Z

let's do it, when it breaks certain environments, we will switch back and find another solution

borkdude 2021-05-09T21:11:21.136700Z

updated the 0.0.6 manifest

kokada 2021-05-09T21:26:25.139600Z

https://github.com/babashka/babashka/pull/828 The CI passed

dorab 2021-05-09T21:27:08.139900Z

What is the recommended way to have a bb task that consists of a Unix pipeline? I tried {:tasks {clean (-&gt; (shell "find . -type f -name '*~' -print") (shell "xargs rm -f"))}} but bb clean gives me a java.util.concurrent.ExecutionException.

borkdude 2021-05-09T21:30:37.140Z

Merged ๐Ÿ™

borkdude 2021-05-09T21:32:46.140800Z

@dorab shell doesn't support this with -&gt;, but babashka.process/process does.

borkdude 2021-05-09T21:33:16.141400Z

I think shell could support this though (made an issue: https://github.com/babashka/babashka/issues/829)

borkdude 2021-05-09T21:35:02.141700Z

What is also possible:

(let [out (:out (shell {:out :string} "find . -type f -name '*~' -print"))]
  (shell {:in out} "xargs rm -f"))

borkdude 2021-05-09T21:35:43.142200Z

although I think this can all be done using babashka.fs glob and delete as well

borkdude 2021-05-09T21:36:16.142600Z

does *~ mean you want to find all files ending with a tilde, or does it expand into something else?

borkdude 2021-05-09T21:37:19.142900Z

{:requires ([babashka.fs :as fs])
 :task (run! fs/delete
             (fs/glob "." "*~"))}

kokada 2021-05-09T21:51:11.143300Z

Thanks @borkdude Just did another PR to make sure that we are using the correct ZLIB sources (avoiding possible security issues): https://github.com/babashka/babashka/pull/830

borkdude 2021-05-09T21:51:58.143600Z

Very good. We should start adopting this for pods as well.

borkdude 2021-05-09T21:53:04.143800Z

(I mean, after downloading a pod we should verify it against a sha in the manifest)

๐Ÿ‘ 1
borkdude 2021-05-09T21:53:47.144100Z

I tested the musl bb on my VPS and had no problems

borkdude 2021-05-09T21:55:17.144300Z

Hmm, I just enabled the musl static build on master

borkdude 2021-05-09T21:55:19.144500Z

and it fails

borkdude 2021-05-09T21:55:35.144700Z

I'm logging off for tonight, but feel free to take a look and I'll be back tomorrow

borkdude 2021-05-09T22:05:00.144900Z

which graalvm version did you use to compile locally?

kokada 2021-05-09T22:05:28.145100Z

> Hmm, I just enabled the musl static build on master Strange, it seems to works fine in my machine

kokada 2021-05-09T22:05:41.145500Z

> which graalvm version did you use to compile locally? I used the version included in the Dockerfile

kokada 2021-05-09T22:05:58.145700Z

Seems to be version 21.0.0

borkdude 2021-05-09T22:07:22.145900Z

hmm ok. your version works fine on my VPS with the clj-http-lite library

borkdude 2021-05-09T22:07:33.146100Z

but it fails on CI with some mysterious stackoverflow

kokada 2021-05-09T22:09:25.146400Z

In that previous version I build musl from scratch, while this one uses the musl-tools prebuilt from Debian

kokada 2021-05-09T22:09:50.146600Z

But I ran the tests above in my machine with the changes that where merged on master

borkdude 2021-05-09T22:10:02.146800Z

Did you run lein test or script/test?

borkdude 2021-05-09T22:10:47.147Z

on CI there are more tests, notably script/run_lib_tests

borkdude 2021-05-09T22:11:04.147200Z

and to test the native, you have to set BABASHKA_TEST_ENV=native

kokada 2021-05-09T22:11:04.147400Z

For the result above I ran BABASHKA_TEST_ENV="native" ./script/run_lib_tests &gt; result

borkdude 2021-05-09T22:11:16.147600Z

that should work yeah

kokada 2021-05-09T22:11:34.147800Z

I am building it locally again and will run the lib tests again

kokada 2021-05-09T22:11:45.148Z

Just to make sure that I am using the correct versions of everything

kokada 2021-05-09T22:17:50.148200Z

Huh... Yeah, now that I build it again it seems to segfault

kokada 2021-05-09T22:17:55.148400Z

I will try my approach again

kokada 2021-05-09T22:18:03.148600Z

(Building musl from scratch)

kokada 2021-05-09T22:18:22.148800Z

borkdude 2021-05-09T22:25:14.149200Z

Any chance you were using 21.1 the first time around? I just merged into my 21.1 branch and there it seems to work fine...

borkdude 2021-05-09T22:28:25.149400Z

(GraalVM version 21.1.0 JDK11 to be specific)

kokada 2021-05-09T22:32:14.149600Z

Nope because I don't even have GraalVM installed in my system

kokada 2021-05-09T22:32:25.149800Z

So if it is building, it should be from Dockerfile

borkdude 2021-05-09T22:35:47.150Z

I see. Note that you can bump up the BABASHK_XMX value in you want to speed up your local compile a bit btw

borkdude 2021-05-09T22:36:40.150200Z

Can you perhaps also try the graal21.1 branch (pull recent changes from Github)?

kokada 2021-05-09T22:39:08.150600Z

You mean trying musl from source with graal21.1?

kokada 2021-05-09T22:39:31.150800Z

Right now I am building Babashka with musl from source, and will run the tests

borkdude 2021-05-09T22:39:33.151Z

No, there is a graal21.1 branch which is the same as master but with 21.1 instead of 21.0

borkdude 2021-05-09T22:39:41.151200Z

ah ok

borkdude 2021-05-09T22:39:52.151400Z

if that works, maybe that's the better approach then

kokada 2021-05-09T22:51:40.151600Z

Nope, I also got an exception So maybe the difference is the musl version :thinking_face: ?

kokada 2021-05-09T22:52:50.151800Z

Yep, I think the issue is the musl version

kokada 2021-05-09T22:54:21.152Z

Let me try one more thing

kokada 2021-05-09T23:04:30.152200Z

Well, downgrading to whatever musl version that comes from Debian fixed the clj-http.lite.client-test tests but broke the clojure.data.generators-test :man-shrugging:

kokada 2021-05-09T23:05:03.152500Z

I think the 1.2.0 that I used first (compiling from source) passed all tests

kokada 2021-05-09T23:05:12.152700Z

But maybe it is just luck

kokada 2021-05-09T23:05:32.152900Z

Let me try the GraalVM 21.1

kokada 2021-05-09T23:14:34.153100Z

Yeah, with GraalVM 21.1 it seems to work

kokada 2021-05-09T23:17:01.153300Z

So, a summary - musl 1.22+GraalVM 21.0: broke clj-http.lite.client-test - musl 1.1.21 (I think, this is the musl from Debian Buster but I am not sure that this is the version we use)+GraalVM 21.0: broke clojure.data.generators-test - musl 1.22+GraalVM 21.1: works fine with all libs that we have tests

kokada 2021-05-09T23:17:43.153500Z

Actually, forget about it

kokada 2021-05-09T23:17:49.153700Z

This test (`clj-http.lite.client-test`) seems flakly

kokada 2021-05-09T23:19:27.154200Z

So yeah, very bizarre :man-shrugging:

kokada 2021-05-09T23:19:49.154400Z

Maybe reporting this issue to GraalVM itself :thinking_face: ?

kokada 2021-05-09T23:21:20.154800Z

Yeah, got the same flakly behavior even with Babashka compiled from master

kokada 2021-05-09T23:21:35.155Z

So if I run the tests once it works, again it segfaults, again it works, again it segfaults...

kokada 2021-05-09T23:32:14.155200Z

@borkdude All I can say for now is that the tests are really really flaky with those static binaries. We could add a retry and run it a few times, but it doesn't seem to be a good sign considering that this behavior doesn't happen with the old glibc binaries CC @rahul080327

kokada 2021-05-09T23:33:50.155400Z

I do think we should report this to GraalVM itself since it seems to be an issue with them, but I don't know how we could report this (we probably need some way to reproduce this with a smaller example that can trigger this behavior)

kokada 2021-05-09T23:36:06.155600Z

My suggestion is maybe releasing this static version with musl separated from the other static binary, at least until we can figure out why this tests are flakly. But I will leave the decision to @borkdude

kokada 2021-05-09T23:47:00.155800Z

Just to leave here, I also tried to build with Debian Buster (we seem to use Debian Stretch actually, since the GCC used by the image is 6.3 instead of the 8.3), since Native Image in the end builds everything with GCC and 6.3 is a quite old GCC version. I ended up using openjdk-17-lein-2.9.6-buster that also uses updated versions of lein and OpenJDK since I didn't find a way to only bump the Debian version. But no dice, same behavior.