testing

Testing tools, testing philosophy & methodology...
seancorfield 2017-09-12T02:54:20.000121Z

@noisesmith @theeternalpulse @didibus Perhaps this would be a better place to discuss philosophy of testing and test coverage tools than #beginners ?

theeternalpulse 2017-09-12T02:57:48.000206Z

hehe, didn't want to go that far into the weeds but this should help others

seancorfield 2017-09-12T03:00:22.000083Z

I'll be honest, I didn't realize this channel existed -- and I'm very interested in testing (I maintain Expectations and I'm a big TDD fan), I just don't think it should happen in #beginners 🙂

theeternalpulse 2017-09-12T03:02:28.000140Z

I like expectations, from my experience in this latest thing I did was write write the example usage, then as I break it up into functons have a #_(whatever this is called) example underneat the skeleton of the individual bits with my expectation. Then paste it into a test buffer after and implement them

seancorfield 2017-09-12T03:11:38.000026Z

I use Atom/ProtoREPL so I can evaluate any form my cursor is in (closest enclosing or entire top-level form). I can also run any named test individually with a hot key (or run all tests in the current ns, or even all tests, again with hot keys).

seancorfield 2017-09-12T03:12:29.000033Z

So I write tests directly in files and eval them into the REPL without switching to another panel. Same with source code I write.

theeternalpulse 2017-09-12T03:18:36.000032Z

right, same with cider/emacs on my end.

theeternalpulse 2017-09-12T03:18:58.000101Z

I'm still figuring out my flow. Trying to ramp up on my clojure experience, javascript is demoralizing

2017-09-12T04:17:52.000062Z

Didn't try expectation yet

2017-09-12T04:18:21.000006Z

Any good study on the benefits of TDD out there?

2017-09-12T04:19:01.000152Z

All I heard of was an internal Microsoft multi-year report that they conclude TDD was a waste of time, and projects who used it turned out to have just as many defects, and just costed more to produce.

2017-09-12T04:19:27.000034Z

But, I heard this from word of mouth of an ex-Microsoft employee, so I can't corroborate

2017-09-12T04:20:24.000107Z

It was pitched against other projects, so its unclear also what it was compared to directly. I'd assume more moderate usage of tests.

2017-09-12T04:21:27.000092Z

Granted I've also never really come around any analyses of the benefits of tests. Same for types, but for types, I found 4 studies which indicate they're mostly useless.

2017-09-12T04:22:36.000141Z

To put some context and not be unjust to types (as I still love them). 2 studies showed the impact of a programming language over another, with javascript on one hand, and haskell on the other, found that they had less then 5% impact on the defect rate.

2017-09-12T04:23:53.000274Z

And its unclear if it can be attributed to types, or the quality of programmers, since the scale went somewhat like: javascrip -> ruby -> python -> c++ -> java -> c# -> f# -> scala -> ocaml -> clojure -> haskell

2017-09-12T04:24:23.000024Z

Which you can classify as less types to more types (with Clojure as an outlier).

2017-09-12T04:24:49.000041Z

Or you can classify as more beginners to more experts.

2017-09-12T04:25:34.000009Z

Or as more imperative to more functional ( and within imperative as less types to more types).

2017-09-12T04:26:51.000073Z

And another 2 studies focused on productivity

2017-09-12T04:28:53.000152Z

Where I think dynamic programming languages showed an average 40% increase in productivity, while having an equal defect rate. Yet, people who used the typed language all said they felt the types helped them be more productive. So the interesting thing was the psychological effect of the guards the types were giving people, it as like reassuring, yet in numbers, it was slower and did not reduce defects.

2017-09-12T04:29:05.000109Z

Would love to see similar things about tests, TDD, etc.

vinai 2017-09-12T13:48:54.000555Z

@didibus This is a related paper which I read a few years back https://pdfs.semanticscholar.org/f997/b973e85f48a0a14907ab3c5ff2b852236ab0.pdf

vinai 2017-09-12T13:49:20.000370Z

There also was a study by IBM, can't find it.

vinai 2017-09-12T13:49:28.000001Z

Don't have time atm

seancorfield 2017-09-12T17:26:25.000410Z

@vinai Was this the IBM study? https://collaboration.csc.ncsu.edu/laurie/Papers/MAXIMILIEN_WILLIAMS.PDF

seancorfield 2017-09-12T17:27:47.000073Z

"Through the introduction of [TDD] the relatively inexperienced team realized about 50% reduction in FVT defect density when compared with an experienced team who used an ad-hoc testing approach for a similar product." /cc @didibus

jakemcc 2017-09-12T17:41:21.000302Z

@didibus I haven’t read through it, but https://people.engr.ncsu.edu/gjin2/Classes/591/Spring2017/case-tdd-b.pdf just popped up on a mailing list I’m part of. From the abstract: > Case studies were conducted with three development teams at Microsoft and one at IBM that have adopted TDD. The results of the case studies indicate that the pre-release defect density of the four products decreased between 40% and 90% relative to similar projects that did not use the TDD practice. Subjectively, the teams experienced a 15–35% increase in initial development time after adopting TDD.

seancorfield 2017-09-12T17:45:56.000108Z

Yeah, I'm pretty suspicious of a supposed MS internal study that spent three years to conclude TDD was a waste of time, based on everything I've read to the contrary 🙂

2017-09-12T18:24:27.000303Z

Well, it was a meta study, the "waste of time" was in revenue

2017-09-12T18:24:34.000002Z

Not in actual defect rate

2017-09-12T18:25:20.000473Z

The person I was talking too was saying its because of the type of defects associated with TDD. Are they the expenssive one, or a bunch of cheap one that get quickly fixed a few weeks after release.

2017-09-12T18:29:00.000565Z

What neither the static vs dynamic types studies and the TDD one I heard off looked at, was the long term maintenance cost. Personally, I've rarely had unit tests or types catch bugs that my end-to-end, integ, QA, or just REPL won't also catch before going to Prod. Sometimes they do, but its very rare. But where I feel there is value add, is for long term maintenance. Adding a feature to a code base that does not have a lot of unit tests, or missing static types does feel (no real data, just my feeling) like a much more challenging undertaking, and something that can easily start to sneak in data corruption bugs and feature regressions.

2017-09-12T18:31:43.000036Z

Thanks for all the studies, I'll give them a look.

2017-09-12T18:37:53.000140Z

Interesting how the IBM study defines TDD: > With TDD, all major public classes of the system have a corresponding unit test class to test the public interface, that is, the contract of that class [8] with other classes (e.g. parameters to method, semantics of method, pre- and post-conditions to method)

2017-09-12T18:41:18.000010Z

I think that IBM study is pretty great to at least demonstrate the value of having: Agile integration, Automated test on builds, and a reasonable unit test coverage, especially around features and bug regression.

2017-09-12T18:42:48.000410Z

But it defines TDD as what I just consider standard test practices. That is, integrate early, have automated tests run continuously, and unit/integ tests most APIs.

2017-09-12T18:43:15.000451Z

Am I the only one who defines TDD as the practice of writing failing tests first, and the function afterwards.

2017-09-12T18:43:27.000273Z

All of the time

2017-09-12T18:44:10.000084Z

Ok, the other study is better: > With this practice, a software engineer cycles minute-by-minute between writing failing unit tests and writing implementation code to pass those tests.

2017-09-12T18:50:02.000789Z

Just to not look too devil advocate with everyone else. This I agree with 100% and everyone should work to build such test assets: > Additionally, since an important aspect of TDD is the creation of test assets—unit, functional, and integration tests.

2017-09-12T18:51:22.000266Z

I'm thinking at the micro-level here mostly. Like what level of testing is the perfect amount to meet the ideal balance of productivity/defect.

2017-09-12T18:55:17.000287Z

My feeling is with Clojure, unit tests can be lowered, because the REPL driven development appears to me as giving similar benefits to unit tests, but faster. Having one trivial happy case and a few happy/not-happy corner cases on public fns to help document and prevent feature regression in the future is probably still important. The time saved from writing less unit tests could go to writing more Functional and Integration tests instead. Granted, the REPL does also often cover part of those, so they might also not need to be as complete as in non REPL driven languages. I'm not sure of this, have very little data, but I'm curious about it.

2017-09-12T18:56:48.000156Z

My second feeling is that test-first isn't useful. It doesn't hurt, but I think that part of TDD isn't actually what is driving most of its benefits. That comes more from just the overall emphasis on testing your code, and creating test assets and automating the development pipeline.

2017-09-12T19:00:17.000058Z

My biggest gripe with test first, is that very rarely do I know what needs to be asserted when I start coding. I'm often exploring a problem space, playing with different code organisations, levels of granularity, and then I toy with the function, and discover its proper behavior. So when I do test first, I lose a lot of time re-writing my tests over and over to adapt to my new learnings and discoveries. So I prefer to add tests only once I"m done that process.

2017-09-12T19:05:50.000439Z

so is this a top-down vs. bottom up thing, where you prefer to work bottom-up?

seancorfield 2017-09-12T19:11:52.000065Z

"My biggest gripe with test first, is that very rarely do I know what needs to be asserted when I start coding" -- see, I find that very strange. After all, you normally start with a problem and so that is what should be asserted (or, in BDD style, what should be expected).

seancorfield 2017-09-12T19:14:15.000266Z

For a certain number of data points in your problem space, you should always know what the solution must produce -- the coding work is figuring out how to produce that -- so you can certainly (expect solution-1 (solver input-1)) in some vague form.

seancorfield 2017-09-12T19:15:44.000308Z

Or (expect predicate-2 (solver input-2)) ... so you can write several of those, representing known aspects of the problem space (and expected outcomes). And then start to sketch out solver itself. And you can decompose solver into subproblems, again with certain known expectations of behavior.

2017-09-12T20:34:07.000036Z

@noisesmith Ya, you can put it that way I guess. For big projects, I'll have a high-level design first, but in the low and medium levels, I prefer bottom up.

2017-09-12T20:36:33.000269Z

@seancorfield > After all, you normally start with a problem In the real world, for business problems, this has never been true for me. Rarely is there a clearly formalized problem to solve. Most customer don't know what they want, or how they want it. Also, at the BDD level I can see that sometimes being more true, but at the unit level?

seancorfield 2017-09-12T20:38:03.000577Z

@didibus I've been writing software for about 35 years. There has always been a problem statement that I'm trying to write a solution to. Therefore there is always expected behavior for the software I write.

seancorfield 2017-09-12T20:39:17.000035Z

I've worked across a broad range of industries, both in Europe and America. I don't know how you can even operate as a software developer if you don't start with a problem statement 😜

2017-09-12T20:39:21.000604Z

Interesting, what field to you work in?>

2017-09-12T20:39:51.000622Z

Well, where do you get your problem statement from?

seancorfield 2017-09-12T20:40:20.000292Z

I've worked in insurance, telecoms, software tooling (QA tools, compilers), e-commerce, data organization, online dating...

2017-09-12T20:40:28.000057Z

Normally, I define it myself, from my learning of the domain space, and my playing around with possible improvements

2017-09-12T20:40:49.000009Z

I've never managed to get the business to offer a defined problem

2017-09-12T20:41:24.000519Z

When they do, its so vague, I can't call it a spec in any way.

2017-09-12T20:43:25.000002Z

You have way more experience then me though, I only worked 2 jobs

2017-09-12T20:43:27.000053Z

😛

2017-09-12T20:56:34.000107Z

As an example, say I get that we have to provide an export and import feature. It rarely goes into any more specific. So the edge cases are left up to my team to find and solve and choose which one are worth dealing with or not and how.

seancorfield 2017-09-12T22:48:32.000172Z

(sorry, got distracted by a production release at work)

seancorfield 2017-09-12T22:49:31.000273Z

@didibus Regardless of who writes the spec, you start with the specification of a problem, even if you write it yourself -- and that specification can always be expressed as a series of tests at various levels. That's pretty much "by definition".

seancorfield 2017-09-12T22:51:57.000294Z

Some specifications can be both "nearly English" and also "executable" if you're a fan of Cucumber, for example https://cucumber.io/docs/reference

seancorfield 2017-09-12T22:53:23.000235Z

(I personally don't like Gherkin / Cucumber but the underlying Given/When/Then approach is a good starting point for figuring out what your tests should cover at a high level)

seancorfield 2017-09-12T22:56:15.000137Z

Given you have an empty file, When you import it, Then the system should be unchanged ... or ... it should be rejected (with ... error message)

seancorfield 2017-09-12T22:59:12.000218Z

"import" will naturally lead to the specification of "what is a valid import (file) format" so you can break that down into a number of levels of specification of the format and the fields and that lets you write a number of tests that expect valid and invalid formats for fields and for the file as a whole.

2017-09-12T23:15:28.000211Z

Ya, I see what you mean now. Just really not my style. I think this does relate to the top-down or bottom-up approach. Also, most spec of complete useful systems are massive, the code in the end is the true spec, spelling it all out before the fact I don't think is really possible, or it would just take a lot of initial effort.

2017-09-12T23:16:55.000164Z

For example, how do I know I have a file? What kind? What format? What does it mean when import? What's the spec for import?

2017-09-12T23:17:45.000019Z

Like answering all that seems like such a slow process, and then, what if you made the wrong choice, and you realize this later? Change your spec, refactor your tests?

seancorfield 2017-09-12T23:18:57.000114Z

I'm not suggesting specifying everything up front -- we all know that doesn't work. You can write tests for each "question" as it comes up and decide what the output should be at each stage. TDD (and BDD) says you start with a failing test and make it pass -- it doesn't say you start with all your tests... 🙂

seancorfield 2017-09-12T23:20:06.000127Z

The point is you must answer those questions and you should write down your decisions -- somewhere other than just encoding them in your source file! -- and tests (or specs) are a great way to record those decisions and make sure future changes don't break things.

2017-09-12T23:21:21.000030Z

Very true, I like BDD, the idea, the frameworkd like Cucomber I'm not a fan, the loose english is great. Understand the use case from the interaction points. Its precise to what people will care about, but is loose enough in-between that I can quickly iterate many designs/implementations for it, until the best one emerges, at which point, I can put some automated unit test on that to prevent future regression. That's normally how I operate.

seancorfield 2017-09-12T23:21:33.000103Z

Now, of course it takes practice not to over-specify systems and produce fragile tests, but if you're just changing code without documenting your changes (esp. of acceptable input formats in the case of "import") then you're a poor excuse for a software developer since no one will be able to figure out what your code does without reading the source code (and folks who asked for "import" don't want to do that, right? 🙂 )

2017-09-12T23:23:01.000093Z

Ya, I guess, but in my experience, you always need to read the source code anyways, because the spec and tests are always slightly off. Its like an uncanny valley. It'll be quicker to read the source, but nothing beats pure source.

seancorfield 2017-09-12T23:24:23.000305Z

(all this said, there are definitely pieces of code I write without creating tests first -- for example where the "given" is too painful/complex to duplicate in code but the "when"/"then" is straightforward -- but I try hard to keep my REPL experiments at least in comment forms these days for ease of evaluation (into the running REPL) and those often become additional tests)

2017-09-12T23:25:57.000165Z

Right, maybe I should give it a better chance. Do you do it more so for pure code or for integration code?

seancorfield 2017-09-12T23:26:25.000328Z

I think it's really good discipline to force yourself to do strict TDD/BDD for a while on each project -- it often highlights all sorts of edge cases you might not have otherwise considered -- and figuring our invariant properties for generative testing is a particular good mental exercise.

seancorfield 2017-09-12T23:27:16.000008Z

If you do TDD, you'll find yourself wanting to separate side-effects from pure code more often -- leading to more reusable code that is easier to reason about (since testing stuff with side-effects can be painful).

2017-09-12T23:27:25.000344Z

I love generative testing. Properties I find more useful then tests. They hold for all, tell you a lot in a small amount of words. And they find a ton of bugs.

seancorfield 2017-09-12T23:28:53.000141Z

But, yeah, there are going to be sometimes where you expect the database to have specific content in it after certain operations, "given" a particular database setup. It's ideal to separate out the actual database but it's not always entirely practical. Or whatever side-effecty thing you need to do.

seancorfield 2017-09-12T23:29:54.000015Z

But you can certainly argue that tests with side-effects aren't "unit" tests -- although that has nothing to do with TDD/BDD in my mind.

2017-09-12T23:31:52.000157Z

I think this: http://david.heinemeierhansson.com/2014/tdd-is-dead-long-live-testing.html Sums up my perspective very well.

2017-09-12T23:33:03.000182Z

And this quote from Kent Beck: > So there’s a variable that I didn’t know existed at that time, which is really important for the trade-off about when automated testing is valuable. It is the half-life of the line of code. If you’re in exploration mode and you’re just trying to figure out what a program might do, and most of your experiments are going to be failures and be deleted in a matter of hours or perhaps days, then most of the benefits of TDD don’t kick in, and it slows down the experimentation—a latency between “I wonder” and “I see.” You want that time to be as short as possible. If tests help you make that time shorter, fine, but often, they make the latency longer. And if the latency matters and the half-life of the line of code is short, then you shouldn’t write tests.

2017-09-12T23:34:09.000216Z

In my case, I have experiments with shelf lives of minutes or seconds sometimes too. My test writing always starts after that "experimentation" phase/

2017-09-12T23:34:56.000234Z

maybe the problem with my code base was that nobody ever left the experimentation phase, and we shipped an experiment to production

😂 1
2017-09-12T23:46:27.000057Z

😞

seancorfield 2017-09-12T23:46:35.000327Z

I'd be curious to know how much "test code" vs "production code" various Clojure shops have... here's our code "lines of code" totals:

Clojure source 211 files 48731 total loc,
Clojure tests 133 files 15092 total loc

seancorfield 2017-09-12T23:47:15.000202Z

(that's just raw lines of code -- and the "tests" incorporate "unit"-level all the way up to automated browser-based UAT stuff)

2017-09-12T23:48:46.000203Z

I don't think I can easily get those metrics, also, we're mixed Java, so it wouldn't show the full picture.

2017-09-12T23:49:15.000042Z

BTW, the pdf linked on David's post is quite interesting a read too: http://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf

2017-09-12T23:49:50.000219Z

I like this especially: > Turn unit tests into assertions. Use them to feed your fault-tolerance architecture on high-availability systems. This solves the problem of maintaining a lot of extra software modules that assess execution and check for correct behavior; that’s one half of a unit test. The other half is the driver that executes the code: count on your stress tests, integration tests, and system tests to do that.

2017-09-12T23:50:12.000046Z

Clojure specs fall in that mindset I think

seancorfield 2017-09-12T23:56:51.000225Z

I remember reading that piece by DHH when it appeared -- and several times since -- and there's a lot of "warning bells" in there about how he was doing both "unit testing" and TDD in my mind, so "of course" he found it problematic. I seem to recall several people in the Agile community responded to his post somewhat disparagingly (and it's not like DHH hasn't posted all sorts of against-the-flow pieces...).

seancorfield 2017-09-12T23:57:40.000168Z

This is very telling: > It just hasn't been a useful way of dealing with the testing of Rails applications. It speaks far more of the problems with Rails than with TDD...