I found a few other places in the actual generation of things where I'm being wasteful
but cutting that out wasn't nearly quite as effective as getting rid of distinct requirement
btw, thanks for all the work on test.chuck, checking
and subsequence
are invaluable tools in my toolbox these days
:)
I don't quite understand what you mean by "putting those values in now after-the-fact"
hm
I'll do an example
FYI distinct collections are generated by maintaining the elements in a transient set: https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check/generators.cljc#L546
oh interesting
so it's not inconceivable that the overhead of adding and looking up elements is slowing things down
if that's really your issue I have a hard time imagining how to make it faster
(gen/fmap (fn [things] (map #(assoc %2 :name (str %1 "-" (:name %2))) (range) things)) (gen/list (gen/hash-map :name gen/string-alphanumeric)))
yeah
I get that, and really for me and my use case it comes down to, am I using the generator for actual values I want to test or just random input?
and in the case of these names, it's just random input that needs to be distinct
I suppose you probably have to compute hash values for the data when you wouldn't otherwise
I won't get much value of shrinking
where I get the value of shrinking here is the number and depth of branches and the leaf values
but not the id
FYI gen/uuid is evenly distributed and doesn't shrink, so you could use that for uniqueness without needing to check
yeah, that occurred to me as well
but this works better for my use-case
it's... complicated
I'm working on a system to let people do self-serve analytics on a data warehouse, but with a complicated permissions structure on top of it
from my experience doing similar things in the past, i know that scenario-based testing has its own set of gotchas
so I'm basically generating the dimension/fact graph and putting that into our data store, which, well, it's not what I would have chose
property-based testing of it though has helped me catch a ton of bugs in the prototype I'm replacing that I don't think anyone would have ever thought to look for
having gotten rid of the distinct name requirements across my entire gen'd tree, and doing some other things around the frequency of size and depth of branches, I cut the run for my test suite down to 1/10th of what it was before
and it still shrinks awesomely
I also included in the trunk generator some flags to turn off certain branches of the tree if they're not needed for a test
@mattly do you think parallelizing tests and/or generators would help out?
I've worked on parallelizing tests before, but it just occurred to me that slow generators could be parallelized even if the tests themselves can't
E.g., during the test run you have one or more background threads doing the generating while the main thread does the actual tests
I'm not sure, tbh
One thing I'm looking into now, after a deeper branch ends up with 100k+ nodes when it starts to fail, is exponential scaling of the size of nodes
which would actually fit the shape of the data I'm trying to model well
You're saying you like it being so large?
eh, well, that's the count of something akin to (gen/vector (gen/vector (gen/vector gen/int)))
but flattened; and I've found a few bugs that have only manifested when the node size gets that large
shrinking, of course, will end up narrowing that down to like the 2 or 3 end nodes that cause the failure
and really it's more due to the nature of the data I'm working with / replicating, and the complex query sets I have to run on top of them
as I find specific cases like that I tend to break out specific tests for that data scenario