@jkc how does that time compare with calling sort
on the two vectors concatted together?
@jkc this is parallelisable with reducers
@danielcompton: you mean like: (print (time (sort (shuffle (take 100000 (iterate inc 1)))))) this? ;; "Elapsed time: 106.398786 msecs” About roughly 9x faster
You should be creating the vector outside of your timing loop
yep, like that
ignore the ‘two vectors’ part, I wasn’t reading it correctly
@danielcompton: good point on the timing including the vector. I’ll have to look into reducers not familiar with them.
@jkc you could look at using transients, they should be a lot faster for this
You’re doing a lot of checking of empty?
, that might be a hotspot
Thanks for the advice! Do you know any good resources about transients? I’m not familiar with them.
@jkc take a look at the source for filterv
it provides you with a collection with the same semantics as a persistent collection, except that it isn’t persistent
you then call persistent!
when you’re done
the downside is you can’t share it while it’s transient
It’s essentially mutable state, so you need to keep it private within a function
And the functions have a ! on the end of their name. So you need to use conj!
when before you’d use conj
Ultimately, your best speed is going to come from dropping down to Java though
@danielcompton: Thanks for all the advice. Looks like I have some reading to do.
@paulspencerwilliams: Sorry no one’s gotten to your review yet. I took a look at it last night and had some small notes.
If you’re still interested.
@potetm I am still interested although will be away from computer for a few hours. If you're happy, provide them hear or tomorrow? Thank you :-)
Sure. I’ll just jot them down, and you can reply to them whenever :simple_smile:
The first thing I noticed was formatting. This is somewhat a personal preference, but lisp code has fairly regular vertical formatting, making it easy to read if you have lots of newlines.
For example, I would have formatted tree-to-seq
like so:
(defn tree-to-seq [m]
(if (map? m)
(flatten
(for [[k v] m]
(cons k (tree-to-seq v))))
m))
Same with the last line:
(apply println
(map
(fn [pair]
(let [[parent child] pair]
(pair-to-sql parent child)))
(partition 2 1 the-seq)))
One odd thing was passing read-data
into pathseq-to-tree
. pathseq-to-tree
operates on data and returns data. It’s simpler if it just takes data as opposed to creating this understanding with the caller about the nature of the function that’s passed in.
I think you know this and just forgot (or maybe it was hard to notice because of the formatting), but the last line says:
(fn [pair]
(let [[parent child] pair]
(pair-to-sql parent child)))
That could just be:
(fn [[parent child]]
(pair-to-sql parent child))
Or even (partial apply pair-to-sql)
depending on how concise you want to be.
Point being, the let
isn’t doing a lot of good there.Lastly, although (apply println …)
is perfectly fine, I generally prefer to do side-effecting across seqs using doseq
. I find it more explicit. This has the added benefit of not passing an undetermined number of arguments to a function, which, I believe, can get pretty expensive.
All of that being said, I think the core of the code looks very good! Clear manipulation of data, despite it being a complex problem.