i’m sure there’s a better and faster way to express this, will stare at it until something comes to mind 😄
hell, sometimes the performance seems noticeably slower, sometimes it seems precisely the same or even faster
perf is actually pretty acceptable when run through advanced compliation, nice
benchmarking stuff now; as you’d expect, that code runs pretty fast (35ms in 1000000 runs) when x1 = x2, but about 50x slower (1400 ms in 1000000 runs) when x1 != x2
so the codepath i need to focus on optimizing is
(for [x (if (< x1 x2)
(range x1 (inc x2))
(reverse (range x2 (inc x1))))]
(-> data
(nth x)
(nth y1)))
. hm - i’m actually not sure how to do this in a more performant way - the ranges and reverses don’t seem to be the issue afaict, it’s just the bit where we have to use nth to index into each column and then call nth on that column to find our value@jrheard the select*
codepath should call next-fn
on each grid cell, rather than calling next-fn
on a sequence of cells
that will be a lot more performant
o nice, i’ll try that
be sure to read the spec for select*
: https://github.com/nathanmarz/specter/blob/master/src/clj/com/rpl/specter/protocols.cljc#L6
will do, thanks!
hm, that may have caused a speedup but not a drastic one
ended up with this:
(defnav
grid-values
[x1 y1 x2 y2]
(select* [this structure next-fn]
(assert (or (= x1 x2)
(= y1 y2)))
(if (not (cell-is-on-grid x1 y1))
; If your starting cell isn't on the grid, you get nothing.
[]
(let [x2 (bound-between x2 0 (dec GRID-WIDTH))
y2 (bound-between y2 0 (dec GRID-HEIGHT))]
(if (= x1 x2)
(let [column (nth structure x1)]
(doseq [value (if (< y1 y2)
(subvec column y1 (inc y2))
(reverse (subvec column y2 (inc y1))))]
(next-fn value)))
(doseq [x (if (< x1 x2)
(range x1 (inc x2))
(reverse (range x2 (inc x1))))]
(next-fn
(-> structure
(nth x)
(nth y1))))))))
i noticed i had to use doseq
rather than for
- presumably next-fn
is side-effecty?
nbd, just interesting
the only other perf thing i can think of is: rather than having grid
be a 2d array, have it be a 1d array that’s primarily accessed via some function like (get-cell grid x y)
, so most consumers don’t treat it like a 1d array, but certain places like select*
can treat it like a 1d array and perhaps get a speedup
which, i think i just have to try and see what happens, and benchmark before and after
doseq
is not very fast
neither is reverse
is there anything less drastic that you’d recommend i try instead, or is this
(doseq [x (if (< x1 x2)
(range x1 (inc x2))
(reverse (range x2 (inc x1))))]
(next-fn
(-> structure
(nth x)
(nth y1))))
unimproveable?i’ve benchmarked the reverse
codepath and it doesn’t seem noticeably slower, it seems to really be the two-nths situation afaict
also, you're making two nth
calls per element, when you could do one nth
call per row
reverse is always getting a list of at most five items fwiw
hm
i see how i can do one nth
call in the case where i’m just doing down the Y axis - eg going from x=4,y=4 to x=4,y=8, and getting all those values
but i don’t see how i can do it if i’m going down the x axis, eg x=4,y=4 to x=8, y=4
i extremely agree with you that the two nth
calls seem to be the culprit - is there some obvious better way of doing things that i’m not seeing?
calling range
is also materializing an additional sequence that isn't necessary
the fastest way to do this is probably with just loop
(fwiw, in my second-to-most-recent snippet, there’s an if
with two paths - the (= x1 x2)
path is pretty fast, and it still has the range
and reverse
)
loop’s a great idea! i forgot it existed 🙂
i’ll try loop in the morning, thanks nathan!
started benchmarking things (just sloppily using cljs’s built-in simple-benchmark
, nothing fancy). i have a pick-move
function that i’m calling 50 times, which causes this navigator codepath to be executed about a jillion times.
when i call next-fn
a single time on the whole sequence, i see timings of about 9300ms for my benchmark.
when i call next-fn
multiple times, once per each item of the sequence, i see the same timings.
this is probably because the sequence is always going to be at most five items long - i imagine that if calling next-fn
once per item tends to give a speedup, perhaps the speedup is more obvious if you’re doing it on a ton of items as opposed to five?
(i should also note that i’m not using advanced compilation while gathering this data)
i switched my slow, two-`nth`-calls doseq
with range
and reverse
to a handrolled loop
, and that brought me down from 9300ms to 7200ms, which is a good start!
i still wish i could figure out a way to not have to do two nth
calls - the only one i can think of is to switch from having a 2d vector to having a 1d vector, and measuring the before-and-after
at this point i’m mainly digging into this just to satisfy my curiosity, perf’s acceptable for my toy app but i’d like to see if the 1d vector approach causes a huge speed boost, none at all, or somewhere inbetween; i’ll post the results later today when i’m done, and then i will stop spamming this channel 🙂
huh, basically zero speedup! my base benchmark timings are different for this one because i used a different board/hand combination to test the 2d vs 1d vector implementations; the 2d vector implementation had a benchmark time of 3100ms-3300ms, and so did the 1d vector implementation (presumably because vectors are implemented under the hood as trees, and so a 1d vector doesn’t guarantee you a contiguous block of memory)
so i guess this is as good as it’s gonna get; my thing still spends most of its time making those two nth
calls, and i don’t see how that can be avoided if you’ve got a 2d vector of values, the first dimension holds the columns, and you’re trying to select all the values from eg x=4,y=4 to x=8,y=4. it seems to me that you have to call nth
two times in that situation, unless i’m missing something
but anyway this has been very interesting and educational for me but no longer has much to do with specter. my app’s specter-related perf seems to be as good as it’s going to get for this query, so again, thanks for all the help and for building this great tool!
yea, for up and down it's two nth
calls per element, but for left and right it's less
right exactly
the one-`nth`-call-and-then-`subvec` dimension is super fast, the two-`nth`-calls-per-element one is 50x slower, and it seems like that’s just a fact of life
oh well!