babashka

https://github.com/babashka/babashka. Also see #sci, #nbb and #babashka-circleci-builds .
yubrshen 2020-10-20T01:29:53.003700Z

I'd like to use babashka to process a file line by line reading from standard input, but I could not produce the processed lines on the standard output. I've simplified my code to the following to figure out the problem: (ns convert.drop-bart-and-uppercase) (defn clean-location [x] x) (defn clean [lines] (->> lines (map clean-location)) ) (clean *in*) and use the following to execute: cat samples.dat | bb -i -o -f ../convert/src/convert/drop_bart_and_uppercase.clj and here is the sample.dat: Time=Thu Oct 1 15:27:15 PDT 2020, Value=75.7, Location=L16-tempmon.xxx.yyy, Device_Type=tempSensor, Value_Type=Temperature Time=Thu Oct 1 15:28:12 PDT 2020, Value=91.4, Location=a40-tc-ups, Device_Type=UPS, Value_Type=Temperature to my understanding, I'd expect it outputs the identical two lines of the content. But I see nothing. Please help me to figure out my problem. Thanks! ------------ I found the following command works as expected: < samples.dat bb -io '(load-file "/home/yshen/data/temperature-data-archive/convert/src/convert/drop_bart_and_uppercase.clj") (convert.drop-bart-and-uppercase/clean *input*)' but I find it too clumsy to load the file and then call the function

borkdude 2020-10-20T06:41:52.006Z

@yubrshen In the first piece of code you use *in* (not *input*) which is not a seq of lines, but just the stdin stream from Clojure.

yubrshen 2020-10-20T14:03:34.024500Z

I would like to learn what is the idiomatic way to process every line of string with Babashka?

borkdude 2020-10-20T14:06:29.026600Z

@yubrshen You can use *input* but this is honestly more for one-liners on the command line. For scripts you might want to use:

$ ls | bb -e "(first (line-seq (io/reader *in*)))"
"CHANGELOG.md"

borkdude 2020-10-20T14:07:30.028Z

io/reader is coming from <http://clojure.java.io|clojure.java.io>

borkdude 2020-10-20T14:10:34.030200Z

ok :)

yubrshen 2020-10-20T17:05:38.030500Z

Finally, this is what works for my need.

yubrshen 2020-10-20T17:10:31.030700Z

&lt; samples.dat bb -i -o '(-&gt;&gt; *input* (map (fn [line] (clojure.string/replace-first line #"Location=([^.,]+)[^,]+" #(str "Location=" (clojure.string/upper-case (last %1)))))))' I can use user/**input** inside of my script file to access the stdin as list of lines, but I have not figured out how to output lines to stdout inside my script. The above one-liner works, but it's getting hard to maintain. Is there such equivalent mechanism to let babashka to help to output lines to stdout from a script?

yubrshen 2020-10-20T17:20:19.030900Z

I can improve the readability but not keeping in the ecosystem of Clojure:

yubrshen 2020-10-20T17:20:42.031100Z

#!/usr/bin/env bash &lt; $1 bb -i -o '(-&gt;&gt; *input* (map (fn [line] (clojure.string/replace-first line #"Location=([^.,]+)[^,]+" #(str "Location=" (clojure.string/upper-case (last %1)))))))'

yubrshen 2020-10-20T17:21:41.031300Z

Is there any better approach, keeping my code mostly in Clojure development environment?

borkdude 2020-10-20T18:05:31.031700Z

@yubrshen I'm not sure what you mean. You can put this code in a file and that should work? https://clojurians.slack.com/archives/CLX41ASCS/p1603213831030700?thread_ts=1603176112.006000&amp;cid=CLX41ASCS

borkdude 2020-10-20T18:07:45.032100Z

Without babashka in/output flags:

(ns my-script
  (:require [<http://clojure.java.io|clojure.java.io> :as io]
            [clojure.string :as str]))

(defn lines []
  (line-seq (io/reader *in*)))

(-&gt;&gt; (lines)
     (map
      (fn [line]
        (str/replace-first line #"Location=([^.,]+)[^,]+"
                           #(str "Location=" (str/upper-case (last %1))))))
     (run! println))

borkdude 2020-10-20T18:08:54.032500Z

This also works with Clojure on the JVM

yubrshen 2020-10-20T19:06:59.032700Z

Yes, exactly, this is what I'm looking for to learn to have the script to run both with Clojure and Babashka. Thanks a million!

borkdude 2020-10-21T07:37:32.045100Z

:thumbsup:

borkdude 2020-10-20T09:31:15.015100Z

How's this for passing options to the nifty $ macro?

user=&gt; (def sw (java.io.StringWriter.))
#'user/sw
user=&gt; (-&gt; ($ ls -la Dockerfile) ^{:out sw} ($ cat) check :exit)
0
user=&gt; (str sw)
"-rw-r--r--@ 1 borkdude  staff  729 Oct 15 17:25 Dockerfile\n"

2020-10-20T09:38:46.015200Z

I don’t use metadata that much so I’m not sure how to read it 😬

borkdude 2020-10-20T09:40:43.015400Z

The metadata preceding the ($ ...) form are the options for that form

2020-10-20T09:45:37.015600Z

So in this case the metadata is attached to the return value of ($ ls -la Dockerfile) and used by ($ cat) ?

borkdude 2020-10-20T09:46:56.015900Z

no, the metadata is only attached to ($ cat), this is how metadata works.

borkdude 2020-10-20T09:47:33.016100Z

it is the same as writing (process '[cat] {:out sw})

2020-10-20T09:50:36.016300Z

I’m not sure if my clojure knowledge is helping me here or actually making it more complex. macroexpand-1 is not helping here (:exit (check ($ ($ ls -la Dockerfile) cat)))

borkdude 2020-10-20T09:52:02.016700Z

yes :)

2020-10-20T09:52:25.016900Z

ok so I think there are two people who will have no issue using this. beginners and more advanced clojure users

2020-10-20T09:52:40.017100Z

But maybe it just a valueable lesson about clojure πŸ™‚ Thank you

2020-10-20T09:53:16.017300Z

I have updated my mental model

borkdude 2020-10-20T13:45:22.021400Z

@yubrshen If you change *in* to *input* in your top program, that should maybe work. If you want to get lines from stdin yourself, you can use (clojure.string/split-lines (slurp *in*)) or (line-seq (<http://clojure.java.io/reader|clojure.java.io/reader> *in*))

borkdude 2020-10-20T13:45:55.022Z

ah, I see. yes. *input* is only defined in the user namespace, so you have to use user/*input* in your top program

borkdude 2020-10-20T13:46:19.022200Z

or get rid of the ns declaration

yubrshen 2020-10-20T14:03:34.024500Z

I would like to learn what is the idiomatic way to process every line of string with Babashka?

borkdude 2020-10-20T14:06:29.026600Z

@yubrshen You can use *input* but this is honestly more for one-liners on the command line. For scripts you might want to use:

$ ls | bb -e "(first (line-seq (io/reader *in*)))"
"CHANGELOG.md"

borkdude 2020-10-20T14:07:30.028Z

io/reader is coming from <http://clojure.java.io|clojure.java.io>

yubrshen 2020-10-20T14:07:51.028500Z

@borkdude I see. Just use/input Thanks! I may need to have the ns namespace in order to use Clojure's test framework.

borkdude 2020-10-20T14:08:19.028600Z

borkdude 2020-10-20T14:10:34.030200Z

ok :)

yubrshen 2020-10-20T17:05:38.030500Z

Finally, this is what works for my need.

yubrshen 2020-10-20T17:10:31.030700Z

&lt; samples.dat bb -i -o '(-&gt;&gt; *input* (map (fn [line] (clojure.string/replace-first line #"Location=([^.,]+)[^,]+" #(str "Location=" (clojure.string/upper-case (last %1)))))))' I can use user/**input** inside of my script file to access the stdin as list of lines, but I have not figured out how to output lines to stdout inside my script. The above one-liner works, but it's getting hard to maintain. Is there such equivalent mechanism to let babashka to help to output lines to stdout from a script?

yubrshen 2020-10-20T17:20:19.030900Z

I can improve the readability but not keeping in the ecosystem of Clojure:

yubrshen 2020-10-20T17:20:42.031100Z

#!/usr/bin/env bash &lt; $1 bb -i -o '(-&gt;&gt; *input* (map (fn [line] (clojure.string/replace-first line #"Location=([^.,]+)[^,]+" #(str "Location=" (clojure.string/upper-case (last %1)))))))'

yubrshen 2020-10-20T17:21:41.031300Z

Is there any better approach, keeping my code mostly in Clojure development environment?

borkdude 2020-10-20T18:05:31.031700Z

@yubrshen I'm not sure what you mean. You can put this code in a file and that should work? https://clojurians.slack.com/archives/CLX41ASCS/p1603213831030700?thread_ts=1603176112.006000&amp;cid=CLX41ASCS

borkdude 2020-10-20T18:07:45.032100Z

Without babashka in/output flags:

(ns my-script
  (:require [<http://clojure.java.io|clojure.java.io> :as io]
            [clojure.string :as str]))

(defn lines []
  (line-seq (io/reader *in*)))

(-&gt;&gt; (lines)
     (map
      (fn [line]
        (str/replace-first line #"Location=([^.,]+)[^,]+"
                           #(str "Location=" (str/upper-case (last %1))))))
     (run! println))

borkdude 2020-10-20T18:08:54.032500Z

This also works with Clojure on the JVM

yubrshen 2020-10-20T19:06:59.032700Z

Yes, exactly, this is what I'm looking for to learn to have the script to run both with Clojure and Babashka. Thanks a million!

Dig 2020-10-20T19:59:04.032900Z

Love the $ macro, and I like metadata use, just took me a little bit of time to figure out how to adopt it for my use. I've ran into some other weird issue that when I use (check) it gets stuff if there is no error, but goes through if there is error. I will try to reproduce with smaller use case and report later.

borkdude 2020-10-20T20:02:07.033100Z

hmm ok, thanks!

Dig 2020-10-20T20:03:07.033300Z

https://github.com/borkdude/babashka/issues/575#issuecomment-713105955 this is pretty cool

borkdude 2020-10-20T20:07:17.033600Z

Yeah! Please let me know about the bug. There's still time to fix before it goes into 0.2.3

Dig 2020-10-20T20:09:46.033800Z

Sorry, very busy this week, I will try to isolate it at some point. Just need to try it with something simpler then aws command line. If I uncomment check above it get stuck on the success, but not on error.

borkdude 2020-10-20T20:10:07.034Z

what does stuck mean?

Dig 2020-10-20T20:10:25.034200Z

no output, like it is waiting for something

Dig 2020-10-20T20:10:31.034400Z

and I have ^C it

borkdude 2020-10-20T20:11:36.034600Z

ah, this explains it. yes, check will wait for the process to exit, else it can't inspect the exit code.

borkdude 2020-10-20T20:12:09.034800Z

so the process is maybe waiting for something?

borkdude 2020-10-20T20:12:31.035Z

check = deref + throw on non-zero

Dig 2020-10-20T20:18:42.035200Z

hmm, strange it definitely exist w/o check

Dig 2020-10-20T20:18:57.035400Z

is there a way to dump stack, like SIGQUIT or something?

Dig 2020-10-20T20:20:00.035600Z

I've tried some https://www.graalvm.org/reference-manual/native-image/NativeImageHeapdump/ but no luck

borkdude 2020-10-20T20:31:11.035900Z

How big is the JSON it's trying to write to stdout?

borkdude 2020-10-20T20:32:40.036100Z

@i.slack Can you try with e.g.:

{:out (io/file "out.json")}
to see if the process is maybe waiting for stdout to be consumed?

Dig 2020-10-20T20:36:54.036300Z

when i added it to $ it writes out 148k file and exits

Dig 2020-10-20T20:37:34.036500Z

if i put #_ in front of it, it gets stuck again

Dig 2020-10-20T20:38:17.036700Z

some kind of buffering thing, try maybe with big .json file?

borkdude 2020-10-20T20:40:21.036900Z

ah so that may be it

borkdude 2020-10-20T20:44:42.037100Z

yeah, so:

(-&gt; (process ["cat"] {:out (io/file "/tmp/foo.csv") :in (io/file "/Users/borkdude/Downloads/1mb-test_csv.csv")}) check)
works, but if I remove :out is has nowhere to write, so cat is going to wait until it can

borkdude 2020-10-20T20:45:45.037300Z

@i.slack A solution:

user=&gt; (def csv (with-out-str (-&gt; (process ["cat"] {:out *out* :in (io/file "/Users/borkdude/Downloads/1mb-test_csv.csv")}) check)))
#'user/csv
user=&gt; (count csv)
1000448

1πŸ‘
borkdude 2020-10-20T20:48:21.037500Z

whereas

(def csv (with-out-str (-&gt; (process ["cat" "foo"] {:out *out*}) check)))
would give an error

borkdude 2020-10-20T20:49:00.037700Z

I'll write a note about this in the docs

borkdude 2020-10-20T20:57:00.037900Z

This is probably also a good option:

(def sw (java.io.StringWriter.))
(-&gt; (process ["cat"] {:in (slurp "<https://datahub.io/datahq/1mb-test/r/1mb-test.csv>") :out sw}) check)
(count (str sw)) ;; 1043005

borkdude 2020-10-20T20:58:27.038100Z

as long as it has a way to write the stream somewhere

borkdude 2020-10-20T21:01:01.038300Z

maybe it would be convenient to have an :out :string for this use case

Dig 2020-10-20T21:01:32.038600Z

yep, that is the case, so it works w/ w-o-s

Dig 2020-10-20T21:02:09.038800Z

yes, :string is good idea, since it is a common case to check and :out slurp

borkdude 2020-10-20T21:41:28.039Z

@i.slack Are you testing with bb or directly on the JVM?

Dig 2020-10-20T21:41:57.039200Z

bb from builds

Dig 2020-10-20T21:42:29.039400Z

that is why i could not dump stack to see where it is stuck

borkdude 2020-10-20T21:42:34.039600Z

This should now work in the JVM lib:

(testing "output to string"
    (is (string? (-&gt; (process ["ls"] {:out :string})
                     check
                     :out))))
I'll push it to bb master

Dig 2020-10-20T21:51:45.039800Z

c00l, i will test it on my use case once it is build

borkdude 2020-10-20T21:51:58.040Z

ok, just pushed it. should be a few minutes

borkdude 2020-10-20T22:13:50.040200Z

Should be there now. With this enhancement the following now also works:

user=&gt; (count (-&gt; (process ["cat"] {:in (slurp "<https://datahub.io/datahq/1mb-test/r/1mb-test.csv>") :out (io/file "/tmp/download.csv")}) check :out slurp))
1043005

borkdude 2020-10-20T22:14:07.040400Z

i.e. :out contains the same value as was put in