code-reviews

martinklepsch 2018-11-12T21:09:29.016900Z

@mfiano You're in the right place, just share your code and hopefully some people will read and review (also try #beginners if there's little reaction here)

2018-11-12T21:09:59.017100Z

Heya

2018-11-12T21:10:20.017600Z

I wrote a project that parses a few different binary audio file types to get metadata https://github.com/mfiano/mediacat

2018-11-12T21:11:06.018600Z

in mfiano.mediacat.parser ns, usage is (parse-file "/path/to/file") supported file types are *.ogg, *.flac, and *.mpc

2018-11-12T21:11:35.019Z

This is the first bit of Clojure I did, so be nice 🙂

2018-11-12T21:11:55.019300Z

Also noteworthy, the first time I touched Java, interop or not 🙂

2018-11-12T21:29:43.020300Z

@mfiano the biggest thing I see is many usages of loop, where eg. dotimes (for numeric bindings in series) or reduce (for doing some action for each entry of an input and building a result) could have been used

2018-11-12T21:30:14.020900Z

and then the usage of the humanize library by full name instead of requiring with an alias in the util namespace

2018-11-12T21:31:05.021700Z

in util/sequence* - I don't think (lazy-seq (loop ...)) makes any sense at all

2018-11-12T21:31:32.022200Z

I tend not to alias a namespace if I'm only using it once, but fair points

2018-11-12T21:32:11.022900Z

yeah, that's a style thing I guess - the risk there is that eg. it works accidentally when moved to a new ns even though you don't add a require

2018-11-12T21:32:35.023500Z

the lazy-seq / loop combo is the one thing I see so far that might actually be a correctness concern

2018-11-12T21:32:47.024Z

Can you explain that last thing about lazy-seq. A friend of mine wrote that for me

2018-11-12T21:33:00.024300Z

you can wrap any sequence in lazy-seq, but that won't make it lazily generate

2018-11-12T21:33:29.024900Z

loop is eager, so what that code says is "create this collection eagerly, and once you are done, wrap it in a lazy-seq abstraction"

2018-11-12T21:33:34.025100Z

which is weird

2018-11-12T21:34:05.025600Z

and might or might not indicate a logic error - it's probably not even that though

2018-11-12T21:34:51.026100Z

also, now that I see what's inside the loop, it's a classic case of a potential concat-bomb

2018-11-12T21:35:07.026500Z

(you'd need a large enough input before ever seeing the issue though)

2018-11-12T21:35:29.026900Z

I asked them to stop by here and talk about it

2018-11-12T21:35:32.027Z

but it can blow up the stack

2018-11-12T21:36:10.027900Z

all in all, for a first clojure project, the quality is very good

2018-11-12T21:37:10.028300Z

Which function was it that had the loop in the lazy seq?

2018-11-12T21:37:20.028600Z

sequence*

2018-11-12T21:37:27.029Z

Right.

2018-11-12T21:37:42.029300Z

It's fully intended to be eager for that portion.

2018-11-12T21:38:31.030500Z

so, the lazy-seq call around it doesn't do anything except making an empty lazy-seq instead of nil - in the loop body it's using lazy-cat on every branch

2018-11-12T21:38:59.031600Z

Yeah, this was after a refactor and it never occurred to me to just raise the loop form

2018-11-12T21:39:01.031700Z

so the lazy-seq on the outside isn't doing much but that isn't an error - just something that makes me suspicious

2018-11-12T21:39:04.031900Z

aha

2018-11-12T21:39:18.032400Z

The lazy-cat is already taking care of all the needed laziness

2018-11-12T21:39:28.032700Z

second issue is the eager nested calls of lazy-cat, see the blog post I linked above

2018-11-12T21:39:36.033Z

Yup, I've read it.

2018-11-12T21:39:41.033200Z

with a large enough input it will blow the stack

2018-11-12T21:40:26.034500Z

Yes. Although it's unlikely to occur in this case, because the part which will blow the stack is the side that's being catenated onto, which is the set of return results from calling a transducer, which means that it's unlikely to be large.

2018-11-12T21:40:29.034600Z

it's straightforward to replace the usage of loop/concat with a recursive call that doesn't have that danger

2018-11-12T21:40:52.034900Z

but you know your domain better than I do

2018-11-12T21:41:35.035600Z

I might try to put a little thought into it today to see if there is a better, more idiomatic way to handle this, perhaps with giving the function more intermediate values as a part of the argument list

2018-11-12T21:41:51.035900Z

actually yeah, that looks like an easy refactor

2018-11-12T21:43:18.037600Z

I'm forgetting the precise locations now, but I saw some usage of count as a local, this is a style issue but it tends to trip me up as a reader and that kind of thing tends to cause weird bugs when refactoring

2018-11-12T21:43:33.038400Z

shadowing of functions with local values in general that is

2018-11-12T21:43:37.038600Z

Yeah, that's fair. Shadowing bindings do tend to get weird

2018-11-12T21:44:01.039100Z

@noisesmith Thanks a lot for the criticism. This project was very tedious...some of these binary formats were under-specified or needlessly convoluted. lazy sequences helped a bunch with transforming the data efficiently. I learned a lot for sure.

2018-11-12T21:48:38.040Z

Also thanks to @suskeyhose and @seancorfield for answering all my questions along the way. You guys are awesome

2018-11-12T21:48:39.040100Z

(defn- sequence*
  [xf coll]
  (let [rf (xf (fn [xs v] (conj xs v)))
        f (fn f [coll xs]
            (if (seq xs)
              (lazy-seq (cons (first xs) (f coll xs)))
              (loop [coll coll]
                (when (seq coll)
                  (if-let [res (rf [] (first coll))]
                    (if (reduced? res)
                      (lazy-seq (cons (first @res) (f nil xs)))
                      (lazy-seq (cons (first res) (f (rest coll) (rest res)))))
                    (recur (rest coll)))))))]
    (f coll [])))

2018-11-12T21:49:08.040700Z

I dislike how slack doesn't allow code highlighting for specified languages

2018-11-12T21:49:22.041Z

It does, just not like that

2018-11-12T21:49:28.041200Z

ah

2018-11-12T21:49:51.041500Z

2018-11-12T21:50:14.042100Z

Anyway, this one shouldn't explode the stack, and has no redundant lazy-seq call

2018-11-12T21:50:59.042500Z

Oh, oops, there's actually one mistake there

2018-11-12T21:51:13.042900Z

the (f nil xs) should actually be (f nil (rest @res))

2018-11-12T21:51:50.043500Z

pardon my reading skills, but what differentiates sequence* from cat ?

2018-11-12T21:52:49.044500Z

sequence* isn't a trasducer, it's a trasducible context that's identical to sequence except that it is not chunked and does not require realizing at least two results or consuming the whole sequence.

2018-11-12T21:53:05.044700Z

I would not have guessed

2018-11-12T21:54:14.045400Z

What would make that more clear?

2018-11-12T21:54:46.046200Z

Probably me stabilizing the code to warrant docstrings for every function 🙂

2018-11-12T21:55:02.046600Z

I'm not sure - maybe local binding names that are meaningful instead of repeated calls to eg. first, rest on the same coll in the same scope

2018-11-12T21:55:05.046800Z

or comments even

2018-11-12T21:55:30.047200Z

Unfortunately it actually does have a bug now that it didn't before. gotta track that down.

2018-11-12T21:56:06.047800Z

@suskeyhose No rush, you can come back to it another time. I'm actually doing some CL code today and tomorrow.

2018-11-12T21:56:21.048100Z

Okay, sounds good.

2018-11-12T21:56:52.049100Z

Anyway, the original will likely never run into the problem of blowing the stack unless you decide to use partition-all with a size in the thousands.

2018-11-12T21:57:08.049500Z

of all the code I saw so far, (not claiming I read deeply...) sequence* would likely benefit from unit tests (which would likely also elucidate what it is meant to do)

2018-11-12T21:58:04.050100Z

it's the only code that made me question my own ability to read it correctly

2018-11-12T21:58:21.050600Z

that's definitely true, and in my own project I'd probably have built some around it. In this case I'm just helping Michael out with some helper functions and thinking functionally for his project.

2018-11-12T21:58:53.051300Z

I think @suskeyhose refactored that function like 4 times already. It's definitely pretty obtuse to me

2018-11-12T21:59:10.051600Z

This is true. It's trying something that is rather complex.

2018-11-12T22:00:35.052400Z

To be honest, sequence works just as well for this project with the input it is given, so I'm not sure chunking would be an issue here. I could be wrong though, I just know that it works with the builtin.

2018-11-12T22:02:15.053500Z

The problem with chunking is that you read in more data than necessary by at least one. Which can be a problem if you're reading a very large sequence and use a rather specific filter on it. So basically it damages performance in the case when computing the next value is expensive.

2018-11-12T22:02:29.053800Z

(and you only need a very specific number of values)

2018-11-12T22:03:15.054800Z

This is true, and also I haven't tested the builtin once we added the reified bit reader, which may possibly advance the stream too much.

2018-11-12T22:03:33.055200Z

Like for example that notes app that I made near the beginning of all this. It used sequence which made getting the first item from the sequence slow when the only other item which matched the filter was at the end.

2018-11-12T22:03:50.055500Z

yeah, it might, which is the other reason

2018-11-12T22:06:07.056700Z

Anyway, this project was fun. A few of the formats I parsed would have been pretty impossible without using lots of imperative code, or putting lots of binary data on the heap for processing. lazy sequencing helped a ton

2018-11-12T22:06:29.057200Z

I wouldn't think of using another language for parsing binary files after those few I encountered

2018-11-12T22:10:44.057800Z

I'd say that's impressive considering just how much better it could be if Java didn't throw so many walls at you for doing binary IO.

2018-11-12T22:11:34.058800Z

The Java library I'm using, bitio, really was a great help with a lot of that. I tried about 3 others, which were all lacking.

2018-11-12T22:11:35.058900Z

the whole "no unsigned types" thing is a bit odd

2018-11-12T22:12:08.059400Z

if they really wanted to be consistent with that they could have defined the integer value of Boolean to be -1 and 0

2018-11-12T22:12:29.059700Z

🙂

2018-11-12T22:13:29.060400Z

So I wasn't wasting tons of space for small bit reads

2018-11-12T22:13:56.060600Z

nice

2018-11-12T22:16:27.062100Z

If I recall correctly, the only mutable function I wrote was tr/reverse-bytes! for reversing a byte array to be read in little endian, since it would have needlessly consed when its only use is going to be discarded afterwards anyway.

2018-11-12T22:19:15.063100Z

I was concerned about performance there a bit, because ultimately I'd like a thread pool to parse an entire directory tree of files to be parsed and sent off to Datalog or something.

2018-11-12T22:25:04.063200Z

Here's the commented, better-named, and better-tested version which doesn't use lazy-cat. What do you think @noisesmith, @mfiano?

2018-11-12T22:26:33.063700Z

I'd probably have used a letfn

2018-11-12T22:27:13.064100Z

But looks good

2018-11-12T22:27:20.064300Z

doesn't work with reducing-function

2018-11-12T22:27:31.064600Z

since you have to call a function on the lambda

2018-11-12T22:27:38.064800Z

before it can be used.

2018-11-12T22:28:08.065300Z

Also, is that more understandable Michael?

2018-11-12T22:28:46.065600Z

I meant for pull-next

2018-11-12T22:29:02.065800Z

Yes very

2018-11-12T22:30:46.066200Z

Suggesting having a nested let and then letfn?

2018-11-12T22:31:35.066900Z

I guess that works. It's actually tabbed over less, so yeah

2018-11-12T22:31:40.067200Z

that's probably not a bad idea

2018-11-12T22:31:46.067400Z

I was just thinking of a way to prevent using the symbol pull-next a few times outside of itself