code-reviews

danielcompton 2015-07-21T02:01:48.000241Z

@jkc how does that time compare with calling sort on the two vectors concatted together?

danielcompton 2015-07-21T02:04:08.000243Z

@jkc this is parallelisable with reducers

jkc 2015-07-21T02:04:15.000244Z

@danielcompton: you mean like: (print (time (sort (shuffle (take 100000 (iterate inc 1)))))) this? ;; "Elapsed time: 106.398786 msecs” About roughly 9x faster

danielcompton 2015-07-21T02:04:51.000245Z

You should be creating the vector outside of your timing loop

danielcompton 2015-07-21T02:05:05.000246Z

yep, like that

danielcompton 2015-07-21T02:05:19.000247Z

ignore the ‘two vectors’ part, I wasn’t reading it correctly

jkc 2015-07-21T02:05:54.000249Z

@danielcompton: good point on the timing including the vector. I’ll have to look into reducers not familiar with them.

danielcompton 2015-07-21T02:05:56.000250Z

@jkc you could look at using transients, they should be a lot faster for this

danielcompton 2015-07-21T02:06:32.000252Z

You’re doing a lot of checking of empty?, that might be a hotspot

jkc 2015-07-21T02:07:05.000253Z

Thanks for the advice! Do you know any good resources about transients? I’m not familiar with them.

danielcompton 2015-07-21T02:08:44.000254Z

@jkc take a look at the source for filterv

danielcompton 2015-07-21T02:09:15.000255Z

it provides you with a collection with the same semantics as a persistent collection, except that it isn’t persistent

danielcompton 2015-07-21T02:09:24.000256Z

you then call persistent! when you’re done

danielcompton 2015-07-21T02:09:59.000257Z

the downside is you can’t share it while it’s transient

danielcompton 2015-07-21T02:10:16.000258Z

It’s essentially mutable state, so you need to keep it private within a function

danielcompton 2015-07-21T02:10:57.000259Z

And the functions have a ! on the end of their name. So you need to use conj! when before you’d use conj

danielcompton 2015-07-21T02:11:12.000260Z

Ultimately, your best speed is going to come from dropping down to Java though

jkc 2015-07-21T02:12:03.000264Z

@danielcompton: Thanks for all the advice. Looks like I have some reading to do.

2015-07-21T14:52:33.000333Z

@paulspencerwilliams: Sorry no one’s gotten to your review yet. I took a look at it last night and had some small notes.

2015-07-21T14:52:53.000334Z

If you’re still interested.

2015-07-21T14:58:46.000335Z

@potetm I am still interested although will be away from computer for a few hours. If you're happy, provide them hear or tomorrow? Thank you :-)

2015-07-21T14:59:18.000336Z

Sure. I’ll just jot them down, and you can reply to them whenever :simple_smile:

2015-07-21T15:16:02.000341Z

The first thing I noticed was formatting. This is somewhat a personal preference, but lisp code has fairly regular vertical formatting, making it easy to read if you have lots of newlines.

2015-07-21T15:17:14.000342Z

For example, I would have formatted tree-to-seq like so:

(defn tree-to-seq [m]
  (if (map? m)
    (flatten
      (for [[k v] m]
        (cons k (tree-to-seq v))))
    m))

2015-07-21T15:17:57.000343Z

Same with the last line:

(apply println
       (map
         (fn [pair]
           (let [[parent child] pair]
             (pair-to-sql parent child)))
         (partition 2 1 the-seq)))

2015-07-21T15:21:39.000344Z

One odd thing was passing read-data into pathseq-to-tree. pathseq-to-tree operates on data and returns data. It’s simpler if it just takes data as opposed to creating this understanding with the caller about the nature of the function that’s passed in.

2015-07-21T15:24:59.000346Z

I think you know this and just forgot (or maybe it was hard to notice because of the formatting), but the last line says:

(fn [pair]
  (let [[parent child] pair]
    (pair-to-sql parent child)))
That could just be:
(fn [[parent child]]
  (pair-to-sql parent child))
Or even (partial apply pair-to-sql) depending on how concise you want to be. Point being, the let isn’t doing a lot of good there.

2015-07-21T15:31:17.000348Z

Lastly, although (apply println …) is perfectly fine, I generally prefer to do side-effecting across seqs using doseq. I find it more explicit. This has the added benefit of not passing an undetermined number of arguments to a function, which, I believe, can get pretty expensive.

2015-07-21T15:32:48.000349Z

All of that being said, I think the core of the code looks very good! Clear manipulation of data, despite it being a complex problem.