code-reviews

Joe 2021-01-24T16:43:56.018800Z

Hi, I have a simple kd-tree / nn-search implementation https://github.com/RedPenguin101/kd-tree/blob/main/src/kdtree/kdtree.clj, and I'm confused about some of the behaviour.

(time (map search big-range))
  ;; 0.36ms

  (time (count (map search big-range)))
  ;; ~30000ms
I'd assume that the reason the first was so quick was something to do with laziness - but when I run it, it does output the result back to my REPL (in much less than 30 seconds), so it must actually be doing the work. Can someone explain what is going on here? I'd also appreciate any help on how I could speed it up. 30 seconds is not significantly faster than a brute force approach, and any feedback generally on the code.

2021-01-24T17:44:29.019Z

The time you see in the first case is only the time needed to build the lazy sequence, without realizing it. Something like (time (doall (map search big-range))) should report the time you're interested in

Joe 2021-01-24T17:52:06.019200Z

Ah I see, thank you! Unfortunately that means the algorithm is just slow 😞

2021-01-24T18:05:31.019400Z

Not necessarily 🙂 For one thing I'm not sure if you intended to build the tree anew in every search call:

(def tree (kd-tree big-points 0))
(def search #(nn-search tree % md))
will cut those 30secs in half approximately.

2021-01-24T18:26:49.019600Z

Apart from that, I'd try profiling the code -- e.g. with https://github.com/ptaoussanis/tufte -- and see where that time is spent. Just from a quick glance: is it possible that undraw as well as its use could be optimized to cause fewer/shallower recursive calculations?

Joe 2021-01-24T18:27:00.019900Z

Oh wow, no I definitely didn't intend to rebuild the tree on every call. I thought deffing search like this (def search #(nn-search (kd-tree big-points 0) % md)) would build the tree only once, is that not the case?

Joe 2021-01-24T18:31:13.020200Z

Well look at that, down to 13 seconds. Awesome, thanks! I'll give the profiler a go too

1👍
2021-01-24T18:31:59.020400Z

Yes, you'll want something like def or let inside your definition of search , otherwise that form will be evaulated on every call

Joe 2021-01-24T19:07:15.020700Z

The profiler worked great! Turns out nearly all the run time was in the distance calculation md, which was being called 2m times. Changing that to unchecked math changed it from 13s to < 1 second 😄

2🙌
2021-01-24T19:17:19.020900Z

Glad it helped!

1👍