tl;dr: Tests are overhyped because we once wrote tests that became irrelevant when the thing we were testing changed
Really? I mean, tests might be overhyped, but so far it looks like the author draws incredibly general conclusions from some isolated incident. Why did the tests become irrelevant? Is their scoring algorithm now doing something completely different? Do they now have different use cases? Were they hard-coding scores when the actual thing that matters was, say, the ordering of the things they score?
> Only test what needs to be tested
well, thanks for the helpful advice :) Care to share what, in your opinion, needs to be tested?
OP here. My conclusion wasn't drawn from an isolated incident; I shared the incident to highlight what I imagine is a common pitfall in testing.
I used the phrase "only test what needs to be tested" intentionally. How should I know what you need to test? But if you accept that you should only test what needs to be tested, then there is an implication that some (I'd venture to say most) of what you write doesn't need to be tested. And that's a liberating concept. You aren't obligated to test everything, but you should test what really matters.
What needs to be tested is probably directly related to the size and stability of the company (assuming size and stability have a positive correlation). I would venture to surmise that young start-ups have almost no need to test anything. That comment isn't meant to be inflammatory, but I look at testing too early like optimizing too early. There's no reason to shard a database until you absolutely have to, and I don't think you need to test something until you know it's mission critical.
For example, Stripe offers a service that were components of it to fail would jeopardize their entire business. Their code base is obligated, by its nature, to have more stringent tests. But the startup in the garage next to yours who is still trying to determine MVP and will probably end up pivoting five times in the next two weeks? Save yourself a lot of grief and don't worry about the tests. Once you know what you are going to be, once you know what you can't afford to lose, well, wrap those portions up with some solid tests.
> I used the phrase "only test what needs to be tested" intentionally. How should I know what you need to test? But if you accept that you should only test what needs to be tested, then there is an implication that some (I'd venture to say most) of what you write doesn't need to be tested. And that's a liberating concept. You aren't obligated to test everything, but you should test what really matters.
In -my- anecdotal experience, this is the excuse developers use while they write bug-ridden code without tests. Maybe they save a little time at the early stage of the development cycle, but the real reason seems to be avoiding work they don't find to be as fun as hacking out code.
After that code is hacked out, we then waste a bunch of cycles across multiple teams while we find and fix out the most obvious of their bugs.
After the release goes out, we spend even more time finding previously undiscovered bugs and fixing them -- which hits actual user satisfaction.
All because some lazy sod thought he was too smart to need tests, and that he could be trusted to decide when his code didn't require them.
Fuck that. I've been doing this too long and seen this happen too many times. If you're on my team, you write the damn tests.
Ok, so if you have people that are legitimately lazy, then they should be fired, full stop. However there is probably another reason they are not writing enough tests:
- They don't see the benefit because they are forced to write lots of tests that do not, and never would have, provided value. (like stupid getter and setter tests being the canonical example)
- Writing tests is hard or slow, either due to things not being tooled out or because the code has not been architected to be testable.
- Writing tests causes lots of false positives, fails to catch real bugs, or end up being extremely brittle. Classic example here is religious mocking of the database in the name of performance, yet fails to catch most bugs since it's, you know, not actually hitting a database.
- The suite is slow or causes deployments to slow down such that adding tests adds a "tax" to everything else people do.
I've had enough experiences with all of the above phenomena so as to become completely skeptical of extensive testing since unless you are a real pro you will inevitably wane on at least one of these things.
We have a slightly different approach to the same problem: if people keep putting out crap code that breaks (or worse, introduces regression bugs) and I will let them go. I have no time for people who do crap work.
You use tests to weed those people out. That's fine, if it works for you.
I'm anal about knowing everything that's happening in my code. I compare git commits to ticket numbers to keep abreast of how things are changing over time. A bug comes through on Goalee and I know which handful of files it will be in. Honestly, I'm quite uptight about this stuff. I catch most shortcuts personally before they go out, and have the appropriate conversation with the responsible party.
I've never seen untested code that wasn't crap, or didn't require just as much work (if not more) to ferret out all the bugs (including watching everyone like a hawk).
A huge percentage of the code you use every day doesn't have tests - Linux kernel? No tests - TCP/IP stack? No tests - Windows Kernel (up till WinXP anyway) ? No tests - Android? No tests - Quake engine ? No tests
Don't make stupidly broad statements like "I've never seen untested code that wasn't crap". Well unless by "untested" you mean "code that was never even ran or QA'd a little". I'm pretty sure you mean "code that didn't go through TDD" though, in which case - you are very, very wrong.
Even if we make your assumption that he's specifically referring to TDD only, just because good code that didn't go through the more rigorous definitions of TDD exists doesn't mean he's seen it. And plenty of crap code ships.
I'd argue Linux has significantly more than 'no' tests: http://stackoverflow.com/questions/3177338/how-is-linux-kern... -- it's not TDD, and it's not centralized, but then again, neither is Linux development. And while you might convince me that pre-XP kernels went without unit testing, I'd point out we hardly call them bastions of stability and security. Try certifying a 360 or ModernUI title under the Microsoft banner and see if they'll let you get away without writing a single test. I'd wager anyone who successfully accomplishes this has spent far more effort arguing why it should be allowed than writing some tests would take.
And, given the sheer girth of their security processes (patching, QA certification, tracking) and toolset (through virtualization, fuzzing, unit testing, CI build servers), it would take much more effort to convince me that they do absolutely no unit testing whatsoever on their kernel in this day and age. Far more than you could reasonably give: I suspect such convincing would require breaking several NDAs and committing several felonies.
I'm curious as to whether you've actually worked on the Linux kernel, or the UNIX descendants, or TCP/IP stacks.
I have. The only reason they work at all is the enormous number of man years spent banging on the poorly tested code after its written. The number of regressions that pop up in core subsystems of the kernel are staggering. VM bugs still happen in commercially shipping systems like OS X because of small changes to extremely fragile and untested code.
Linux isn't a paragon of code quality, and neither are most Network stacks. If anything, you're proving my point -- code written in that methodology has required enourmous amounts of work to stabilize and prevent/fix regressions going forward.
Case in point; a few years ago, I found a trivial signed/unsigned comparison bug but in FreeBSD device node cloning that was triggered by adding/removing devices in a certain order; this triggered a kernel panic, and took teo days to track down. The most trivial of unit tests would have caught this bug immediately, but instead it sat there languishing until someone wrote some network interface cloning code (me) that happened to trigger this bug faaar down below the section of the kernel I was working in.
This kind of thing happens all the time in the projects you list, and its incredibly costly compared to using testing to not ship bugs in the first place.
That's the statement I was replying to - and yes, code with tests is going to be better. But to call all code without tests crap is directly saying that the Linux code is crap, that the BSD code is crap, tcp/ip code is crap, etc. I disagree completely - the code is awesome and has built billion dollar industries and has made most peoples lives better. Could it be improved with more tests? Sure. Everything can be improved. Calling it 'crap' however is insane.
I, personally, would be more than happy to have produced the 'crap' BSD code that has propelled Apple into one of the most valuable companies today.
> I disagree completely - the code is awesome and has built billion dollar industries and has made most peoples lives better.
It shipped. That doesn't prove your point. Code that is expensive to produce and ships is better than no code at all. That doesn't mean that this is the best way to produce code.
What exactly do you think is awesome about the code?
> I, personally, would be more than happy to have produced the 'crap' BSD code that has propelled Apple into one of the most valuable companies today.
I did produce some of that code.
Nothing that you're saying justifies why producing worse code, less efficiently, is better than producing good code, more efficiently. Your position assumes a false dichotomy where the choice is between shipping or not shipping.
The truth of the matter is that the legacy approach to software development used in those industries has more to do with the historical ignorance of better practices and the resulting modern cultural and technical constraints. In the era that most of that code was originally architected, it was also normal to write C programs with all globals, non-reentrant, and non-thread-safe. Are you going to claim that this is also the best way to write code, just because it produced some mainstays of our industry?
Nothing that you're saying justifies why producing worse code, less efficiently, is better than producing good code, more efficiently. Your position assumes a false dichotomy where the choice is between shipping or not shipping.
Well, you say it's a false dichotomy, but if TDD really does reliably produce better code and more efficiently than a non-TDD approach, how come hardly any of the most successful software projects seem to be built using TDD? It's been around a long time now, with no shortage of vocal advocates. If it's as good as they claim, why aren't there large numbers of successful TDD projects to cite as examples every time this discussion comes up?
The fact that every time I have ever had this discussion, the person taking your position comes back with a question like that instead of just listing counterexamples.
Can you name a few major projects, preferably on the same kind of scale as the examples that have been mentioned in other posts in this discussion, that were built using TDD? Not just projects that use some form of automated testing, or projects that use unit tests, or projects that use a test-first approach to unit tests, or anything else related to but not implying TDD, but projects where they really follow the TDD process?
The development processes for major open source projects tend to be public by their nature, and there are also plenty of people who have written about their experiences working for major software companies on well-known closed source projects, so telling a few TDD success stories shouldn't be a difficult challenge at all.
> Can you name a few major projects, preferably on the same kind of scale as the examples that have been mentioned in other posts in this discussion, that were built using TDD?
Operating systems? No. Sorry. Operating systems, as a rule, predate modern practices by some 1-4 decades. Making a kernel like Linux, Darwin, or FreeBSD testable would be a task somewhere between "massive headache" and "really massive headache". I've done some in-kernel unit testing, but only on our own driver subsystems that could be abstracted from the difficult-to-test underlying monolithic code base.
Outside of kernels/operating systems, just Google. Unit/automated testing is prevalent in just about all modern OSS software ecosystems.
My apologies if I've misinterpreted your posts. I thought we were talking specifically about TDD, since that seemed to be the subject of the original article, but from the examples you gave, you seem to be talking about automated unit testing more generally now. In that case, I would certainly agree that there are successful projects using the approach.
However, I would also say that if you've never seen code untested in that way that wasn't crap then you're not looking hard enough. Some of the most impressively bug-free projects ever written used very different development processes with no unit testing at all. Space shuttle software, which is obviously about as mission critical as you can get, went in a very different direction[1]. Donald Knuth, the man who wrote TeX among other things, isn't much of a fan either[2].
There are also other means to assure high quality of code. Testing is just one of many tools. And even 100% coverage does not guarantee your code still isn't crap. Your tests might be crap as well. By testing you can only prove there is a bug but you can't prove there isn't. You can only hope that if you tested the right things, it is likely your code works fine.
For example static code analysis is sometimes superior to testing, because it can prove for absence of some wide classes of bugs.
And in code reviews we often find subtle bugs that would be extremely hard to write tests for (e.g. concurrency related bugs).
You can also decrease bug rates by writing clean, understandable code. Which is often related to hiring a few great programmers instead of a bunch of cheap ones.
I've heard this saying: "Test until fear turns to boredom." Then you need to make sure you get bored at a properly calibrated rate (i.e. not too soon given your business' criticality, not too late given your need to move quickly). For example, if a lot of expensive bugs during final cause analysis are found to be preventable via unit tests, then your calibration is probably tuned to being bored too quickly.
This is a great line. I think it goes deeper though, in that after doing this for a while, if it hurts, you are probably doing it wrong. Writing code with too little tests hurts, because you are scared shitless. Writing code with too many tests hurts, because you are performing a religious rite not actually building something useful.
The trouble is that you have to do this for a while to get to the point where you can both feel and recognize pain, or the lack thereof.
That's an excellent saying. It's exactly the line I try to tread with testing. I don't know what to do with the angst of getting the tradeoff wrong though.
1) Test your APIs. A public API, especially one that is key to your business, should be near 100% coverage. And attacked to look for security/usability/load problems.
2) Test enough during development to support later regression tests, and to make sure that the design is testable. This can usually be achieved with less than 20% coverage. But if you write production code that's so screwy it can't be regression tested, then you've got big problems.
3) Test any parts that scare you or confuse you or make you nervous. Use "test until you're more bored than scared" here.
Terse, but extremely helpful and well formulated. I'll print these out for further reference ;-)
I'd add
4) Continuous integration testing doubling as operations monitoring. I know that this doesn't "really" qualify as testing, but from my experience, it's far easier getting bitten by components not working together than from a screwed component alone. The key here is monitoring and testing business parameters from end to end. It doesn't help you finding out where the bug is, but it lets you know early on if there is one.
>well, thanks for the helpful advice :) Care to share what, in your opinion, needs to be tested?
I am not the OP, but I think unit tests are more useful for code that has a lot of edge cases (such as string parsing) and for code which causes the most bugs (as seen in black box testing or integration tests.)
Also, some code is just easier to develop if you create a harness that runs it directly instead of having to work your way to the point in the program where it would execute it. If you do this, you might as well turn it into a test.
* A test should accept any correct behaviour from the tested code. Anything which is not in the requirements, should not be enforced by the test.
* A test should not use the same logic as the code to find the "right" answer.
* A function whose semantics are likely to be changed in the next refactor should not be invoked from a test.
* Whenever a test fails, make a note of whether you got it to pass by changing the test or by changing the code. If it's the test more than 2/3 of the time, it's a bad test.
* If you can't write a good test, don't write a test at all. See if you can write a good test for the code that calls this code instead.
Really? I mean, tests might be overhyped, but so far it looks like the author draws incredibly general conclusions from some isolated incident. Why did the tests become irrelevant? Is their scoring algorithm now doing something completely different? Do they now have different use cases? Were they hard-coding scores when the actual thing that matters was, say, the ordering of the things they score?
> Only test what needs to be tested
well, thanks for the helpful advice :) Care to share what, in your opinion, needs to be tested?