Not sure if this is allowed, but any pointers on http://codereview.stackexchange.com/questions/152944/clojure-flatten-a-nested-map would be great 🙂
@poooogles Does the input map always have the same structure?
or can it be arbitrarily recursive?
Running that function from SO with the provided example with criterium gives an execution mean time of 2.947716 µs
@sveri Arbitrary I'm afraid. I've got some criterium tests on the real structures which are much bigger that are nowhere near as fast.
FAIL in (test-normalise-performance) (idbevt_test.clj:51)
Check performance of our normalisation.
expected: (< (first (:sample-mean result)) target-time)
actual: (not (< 5.225852301972124E-4 2.5E-5))
x86_64 Mac OS X 10.9.5 4 cpu(s)
Java HotSpot(TM) 64-Bit Server VM 25.51-b03
Runtime arguments: -Dfile.encoding=UTF-8 -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -XX:-OmitStackTraceInFastThrow -Dclojure.compile.path=/Users/spegler/Documents/Clojure/clj-utils/target/base+system+user+dev/classes -Dclj-utils.version=0.3.3 -Dclojure.debug=false
Evaluation count : 91320 in 60 samples of 1522 calls.
Execution time sample mean : 810.484728 µs
Execution time mean : 811.323195 µs
Execution time sample std-deviation : 132.183916 µs
Execution time std-deviation : 134.245365 µs
Execution time lower quantile : 628.892823 µs ( 2.5%)
Execution time upper quantile : 1.069945 ms (97.5%)
Overhead used : 12.416037 ns
is with the "real" data, I just posted the most simple solution I could as I assumed optimisations would scale.
My first assumption would be that its the recursion that causes this. Would be interesting to find out how deeply nested they are in the most common case and setup test cases for these
You will need them later anyway for comparison if you improve the performance
I'll see if I can post a gist with some better examples.
@sveri https://gist.github.com/Poogles/4bce0b780354b536ae714fdc998b14e5
@poooogles Hm, this brings me to 21,695200 µs which is an order of magnitude more than before but still an order less than your ones.
Apart from that, is there maybe a way to tackle this by redesigning the data? Maybe if you know the structure of the data in advance you could pick out the keys?
Or when you build the map, build it in a different way? Or keep track of the map building process and cache the paths to the keys you are interested in?