test-check

mattly 2016-07-28T00:00:05.000093Z

I found a few other places in the actual generation of things where I'm being wasteful

mattly 2016-07-28T00:00:46.000094Z

but cutting that out wasn't nearly quite as effective as getting rid of distinct requirement

mattly 2016-07-28T00:03:01.000095Z

btw, thanks for all the work on test.chuck, checking and subsequence are invaluable tools in my toolbox these days

2016-07-28T00:03:46.000096Z

:)

2016-07-28T00:04:03.000097Z

I don't quite understand what you mean by "putting those values in now after-the-fact"

mattly 2016-07-28T00:04:09.000098Z

hm

mattly 2016-07-28T00:04:52.000099Z

I'll do an example

2016-07-28T00:05:49.000100Z

FYI distinct collections are generated by maintaining the elements in a transient set: https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/generators.cljc#L546

mattly 2016-07-28T00:05:59.000102Z

oh interesting

2016-07-28T00:06:35.000103Z

so it's not inconceivable that the overhead of adding and looking up elements is slowing things down

2016-07-28T00:06:56.000104Z

if that's really your issue I have a hard time imagining how to make it faster

mattly 2016-07-28T00:07:16.000105Z

(gen/fmap (fn [things] (map #(assoc %2 :name (str %1 "-" (:name %2))) (range) things)) (gen/list (gen/hash-map :name gen/string-alphanumeric)))

mattly 2016-07-28T00:07:24.000106Z

yeah

mattly 2016-07-28T00:08:02.000107Z

I get that, and really for me and my use case it comes down to, am I using the generator for actual values I want to test or just random input?

mattly 2016-07-28T00:08:20.000108Z

and in the case of these names, it's just random input that needs to be distinct

2016-07-28T00:09:03.000109Z

I suppose you probably have to compute hash values for the data when you wouldn't otherwise

mattly 2016-07-28T00:09:05.000110Z

I won't get much value of shrinking

mattly 2016-07-28T00:10:41.000111Z

where I get the value of shrinking here is the number and depth of branches and the leaf values

mattly 2016-07-28T00:10:50.000112Z

but not the id

2016-07-28T00:12:30.000113Z

FYI gen/uuid is evenly distributed and doesn't shrink, so you could use that for uniqueness without needing to check

mattly 2016-07-28T00:13:08.000114Z

yeah, that occurred to me as well

mattly 2016-07-28T00:13:50.000115Z

but this works better for my use-case

mattly 2016-07-28T00:13:59.000116Z

it's... complicated

mattly 2016-07-28T00:14:46.000117Z

I'm working on a system to let people do self-serve analytics on a data warehouse, but with a complicated permissions structure on top of it

mattly 2016-07-28T00:15:58.000118Z

from my experience doing similar things in the past, i know that scenario-based testing has its own set of gotchas

mattly 2016-07-28T00:16:59.000119Z

so I'm basically generating the dimension/fact graph and putting that into our data store, which, well, it's not what I would have chose

mattly 2016-07-28T00:17:55.000120Z

property-based testing of it though has helped me catch a ton of bugs in the prototype I'm replacing that I don't think anyone would have ever thought to look for

mattly 2016-07-28T06:04:44.000121Z

having gotten rid of the distinct name requirements across my entire gen'd tree, and doing some other things around the frequency of size and depth of branches, I cut the run for my test suite down to 1/10th of what it was before

mattly 2016-07-28T06:04:57.000122Z

and it still shrinks awesomely

mattly 2016-07-28T06:05:33.000123Z

I also included in the trunk generator some flags to turn off certain branches of the tree if they're not needed for a test

2016-07-28T14:37:34.000124Z

@mattly do you think parallelizing tests and/or generators would help out?

2016-07-28T14:44:57.000125Z

I've worked on parallelizing tests before, but it just occurred to me that slow generators could be parallelized even if the tests themselves can't

2016-07-28T14:45:53.000126Z

E.g., during the test run you have one or more background threads doing the generating while the main thread does the actual tests

mattly 2016-07-28T14:48:20.000127Z

I'm not sure, tbh

mattly 2016-07-28T14:49:29.000128Z

One thing I'm looking into now, after a deeper branch ends up with 100k+ nodes when it starts to fail, is exponential scaling of the size of nodes

mattly 2016-07-28T14:50:51.000129Z

which would actually fit the shape of the data I'm trying to model well

2016-07-28T16:13:40.000130Z

You're saying you like it being so large?

mattly 2016-07-28T17:16:52.000131Z

eh, well, that's the count of something akin to (gen/vector (gen/vector (gen/vector gen/int))) but flattened; and I've found a few bugs that have only manifested when the node size gets that large

mattly 2016-07-28T17:17:20.000132Z

shrinking, of course, will end up narrowing that down to like the 2 or 3 end nodes that cause the failure

mattly 2016-07-28T17:17:43.000133Z

and really it's more due to the nature of the data I'm working with / replicating, and the complex query sets I have to run on top of them

mattly 2016-07-28T17:21:04.000134Z

as I find specific cases like that I tend to break out specific tests for that data scenario