planck

Planck ClojureScript REPL
2018-01-07T14:28:29.000072Z

I’m trying to use planc for processing a huuge file, I thought it might be cool to make it stream based and do ./blah.cljs < file > output instead of reading the file, processing and then writing

2018-01-07T14:29:05.000096Z

I’m trying to decipher <http://planck.io|planck.io> docs but I don’t understand how to get stdin stream and stdout stream

mfikes 2018-01-07T14:46:17.000037Z

@nooga If your processing is textual and line based, planck.core/read-line might be useful

mfikes 2018-01-07T14:47:37.000059Z

An interesting thing that may easily occur when processing an absolutely huge file this way is head-holding.

mfikes 2018-01-07T14:49:08.000007Z

My immediate thoughts on that issue is to try to build something that reduces on (iterate (fn [_] (planck.core/read-line)) nil)

2018-01-07T14:49:12.000010Z

I’ve got ~300MB of stanzas like: AA=12345678 BBA=12345678 CCC=12345678 and I basically need to make it so that they end up as 12345678 12345678 12345678 in separate lines

mfikes 2018-01-07T14:50:20.000072Z

Ahh, that's cool, perhaps a partition transducer could help get the pairs of lines.

mfikes 2018-01-07T14:51:11.000033Z

Also, to really go the transducer route, you'd need the reducible iterator that is in ClojureScript head, which isn't yet in the shipping Planck. (It is easily built, though via script/pilot in the Planck source tree.)

mfikes 2018-01-07T14:55:01.000040Z

@nooga The reason I mention holding head, is that if blah.cljs looked like

(require '[planck.core :refer [line-seq *in*]])

(run! println (partition 2 (line-seq *in*)))
Then it would print the pairs of lines, and arguably be a clean streaming solution. But it will still hold all lines in memory, if that's a concern.

2018-01-07T14:55:34.000070Z

it may be since these files are huuge 😉

mfikes 2018-01-07T14:55:55.000070Z

And in that case, the new iterate may be self-hosted's friend 🙂

2018-01-07T14:56:27.000033Z

I’m writing an openrisc emulator in Java to have linux running inside of JVM and my main method of debugging is comparing CPU state logs from my emu and openrisc qemu

mfikes 2018-01-07T14:56:32.000067Z

300 MB should easily fit in RAM. The transducer approach is fun to mess around with though.

2018-01-07T14:56:52.000047Z

yeah, got 16GB of ram here but somehow this feels dirty 😄

2018-01-07T14:57:42.000021Z

I tried sed but it drove me crazy

mfikes 2018-01-07T14:59:10.000077Z

I agree. The only reason ClojureScript doesn't clear locals is because there hasn't been much demand for it. Maybe if self-hosted ClojureScript becomes popular, that could cause some demand. In the meanwhile, I've been exploring the "reducible" route, if that makes sense. In other words, you could transduce on the sequence produced by iterate without consuming RAM. The only dirty thing about that approach for this problem is that you'd need to write to stdout as a side effect of the reduction 😞

mfikes 2018-01-07T15:19:14.000032Z

@nooga I'm checking to see if this doesn't consume RAM:

(require '[planck.core :refer [read-line]])

(transduce (comp (drop 1)
                 (take-while some?)
                 (partition-all 2))
  (fn [_ x] (println x))
  nil
  (iterate (fn [_] (read-line)) nil))

2018-01-07T15:24:54.000085Z

cool, I settled for a simple loop

2018-01-07T15:25:02.000023Z

and it did the job… slowly

mfikes 2018-01-07T15:30:51.000026Z

Cool. FWIW, Planck also has -s, -f, and -O simple as ways to try to make things run faster.

2018-01-07T15:31:11.000049Z

nice! didn’t know that

2018-01-07T15:31:39.000031Z

ah, I converted the files and tried to use them but now I see that they’re rubbish :F

2018-01-07T15:32:02.000019Z

debugging linux kernel on a CPU that you wrote is no fun

2018-01-07T15:32:03.000001Z

😄

mfikes 2018-01-07T15:32:18.000054Z

Hah

2018-01-07T15:36:05.000046Z

esp after writing mostly clojure and functional langs for last 3 years

mfikes 2018-01-07T15:43:52.000029Z

Well, FWIW, the transducer approach using iterate (with ClojureScript master) doesn't consume RAM

2018-01-07T15:44:27.000023Z

that’s awesome!

2018-01-07T15:44:42.000059Z

thanks for checking it out 🙂

mfikes 2018-01-07T21:26:15.000003Z

On Planck master line-seq is directly reducible. This allows reducing over gigantic files without consuming RAM, avoiding ClojureScript head-holding. This example is over a 1 GB file.

cljs.user=&gt; (require '[planck.core :refer [line-seq]]
       #_=&gt;  '[<http://planck.io|planck.io> :as io]
       #_=&gt;  '[clojure.string :as string])
nil
cljs.user=&gt; (reduce
       #_=&gt;  (fn [c line]
       #_=&gt;   (cond-&gt; c
       #_=&gt;    (string/starts-with? "a" line) inc))
       #_=&gt;  0
       #_=&gt;  (line-seq (io/reader "big.txt")))
134217728