Hi. I’m using file-seq
to get all the files whose names match a few simple patterns (e.g. “*.ml”). Unfortunately the tree includes 64K items, So I need a version of file-seq
that supports a filter function, so I can tell it to ignore files whose names do not match a list of patterns. Since file-seq
calls go’s
func Walk(root string, walkFn WalkFunc) error
that should be relatively easy, even for somebody who doesn’t know much go. My question is whether there is a way to do that by providing a go file in my app code so I don’t have to mess with the joker source code. Well, I guess the first question is whether this would speed things up significantly.Well, I got curious, and decided to do a little testing, and the custom-Go-code approach does seem to save a little time:
craig@pony:~/go$ xtime ./src/github.com/candid82/joker/joker -e '(doseq [f (joker.filepath/my-file-seq ".")] (println (:name f)))' | wc -l
u=1.14 s=0.29 r=0.95 cpu=150% kBresavg=0 kBresmax=44736 kBundata=0 kBunstack=0 kBtext=0 Bpagsiz=4096 kBavgtot=0
fsin=0 fsout=0 sockrcv=0 socksnt=0 pfmaj=0 pfmin=11339 vol=178 invol=2806 signals=320 swaps=0
rc=0 ./src/github.com/candid82/joker/joker -e (doseq [f (joker.filepath/my-file-seq ".")] (println (:name f)))
18290
craig@pony:~/go$ xtime joker -e '(doseq [f (joker.filepath/file-seq ".")] (or (joker.string/ends-with? (:name f) ".go") (println (:name f))))' | wc -l
u=1.29 s=0.32 r=1.15 cpu=140% kBresavg=0 kBresmax=55424 kBundata=0 kBunstack=0 kBtext=0 Bpagsiz=4096 kBavgtot=0
fsin=0 fsout=0 sockrcv=0 socksnt=0 pfmaj=2051 pfmin=11983 vol=611 invol=3866 signals=311 swaps=0
rc=0 joker -e (doseq [f (joker.filepath/file-seq ".")] (or (joker.string/ends-with? (:name f) ".go") (println (:name f))))
18290
craig@pony:~/go$ xtime find . \! -name "*.go" | wc -l
u=0.06 s=0.23 r=0.30 cpu=99% kBresavg=0 kBresmax=1008 kBundata=0 kBunstack=0 kBtext=0 Bpagsiz=4096 kBavgtot=0
fsin=0 fsout=0 sockrcv=0 socksnt=0 pfmaj=9 pfmin=377 vol=0 invol=120 signals=0 swaps=0
rc=0 find . ! -name *.go
18290
craig@pony:~/go$
So the custom approach takes about 1.14s, while the vanilla Joker approach takes 1.29s, versus fine at 0.06s.
Here are the two pertinent diffs (a third appears after building via ./run.sh
due to autogeneration of Go code):
diff --git a/std/filepath.joke b/std/filepath.joke
index d6a860d1..4f3f9995 100644
--- a/std/filepath.joke
+++ b/std/filepath.joke
@@ -9,6 +9,13 @@
:go "fileSeq(root)"}
[^String root])
+(defn my-file-seq
+ "Returns a seq of maps with info about files or directories under root, exception
+ for those with names ending in '*.go'."
+ {:added "1.0"
+ :go "myFileSeq(root)"}
+ [^String root])
+
(defn ^String abs
"Returns an absolute representation of path. If the path is not absolute it will be
joined with the current working directory to turn it into an absolute path.
diff --git a/std/filepath/filepath_native.go b/std/filepath/filepath_native.go
index 0af7f449..416b75b8 100644
--- a/std/filepath/filepath_native.go
+++ b/std/filepath/filepath_native.go
@@ -3,6 +3,7 @@ package filepath
import (
"os"
"path/filepath"
+ "strings"
. "<http://github.com/candid82/joker/core|github.com/candid82/joker/core>"
)
@@ -17,3 +18,17 @@ func fileSeq(root string) *Vector {
})
return res
}
+
+func myFileSeq(root string) *Vector {
+ res := EmptyVector()
+ filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
+ PanicOnErr(err)
+ if strings.HasSuffix(path, ".go") {
+ return nil
+ }
+ m := FileInfoMap(path, info)
+ res = res.Conjoin(m)
+ return nil
+ })
+ return res
+}