Unit Testing的價值被過度高估 @ David Ko的學習之旅

Unit Testing的價值被過度高估

Testing is overrated
http://railspikes.com/2008/7/11/testing-is-overrated
Posted by Luke Francl
on Friday, July 11

作者在這篇文章中, 提到了一個觀點: "測試的價值被過度高估". 因為在agile中, 強調RD要做unit testing, 要採用TDD. 但是作者認為這樣是不夠的, 他認為這個效果有過度被渲染, 事實上RD還是花很多時間在做debugging. 所以根據Steve McConnell (Code Complete一書的作者)所說的, 必須還要搭配各式不同的方法,來確保軟體的品質, 因為每種方法所找出的bug是不同的.

1. RD測試所帶來的問題
首先, 作者先談為何光是RD做unit testing是不夠的. 由RD來做測試會有一些limitations, 這些是作者所觀察到的

A. 測試是非常困難的....大部分的RD並不十分擅長
- Programmers tend write “clean” tests that verify the code works, not “dirty” tests that test error conditions.
- Steve McConnell reports, “Immature testing organizations tend to have about five clean tests for every dirty test.
- Mature testing organizations tend to have five dirty tests for every clean test.
- This ratio is not reversed by reducing the clean tests; it’s done by creating 25 times as many dirty tests.” (Code Complete 2, p. 504)

B. 你沒有辦法測試尚未寫出的Code
- Robert L. Glass discusses this several times in his book Facts and Fallacies of Software Engineering.
- Missing requirements are the hardest errors to correct, because often times only the customer can detect them.
- Unit tests with total code coverage (and even code inspections) can easily fail to detect missing code.
- Therefore, these errors can slip into production (or your iteration release).
- Tests alone won’t solve this problem, but I have found that writing tests is often a good way to suss out missing requirements.

C. Test Cases也可能包含錯誤
- Numerous studies have found that test cases are as likely to have errors as the code they’re testing (see Code Complete 2, p. 522).
- So who tests the tests? Only review of the tests can find deficiencies in the tests themselves.

D. RD所做的測試並不能有效地找出Bugs
- To cap it all off, developer testing isn’t all that effective at finding defects.
- Defect-Detection Rates of Selected Techniques (Code Complete 2, p. 470)
Removal Step                        Lowest Rate    Modal Rate    Highest Rate
Informal design reviews        25%                35%            40%
Formal design inspections    45%                55%            65%
Informal code reviews            20%                25%            35%
Modeling or prototyping        35%                65%            80%
Formal code inspections        45%                60%            70%
Unit test                                15%                30%            50%
System test                            25%                40%            55%

2. 不要把所有雞蛋放在同一個籃子
因此作者提出不要把所有雞蛋放在同一個籃子的論調. 不同種類的defect detection 技巧, 能找到不同種類的問題. 因此你不能只是用其中一種, unit testing, manual testing, usability testing 和 code review 都要使用

A. Manual testing
- As mentioned above, programmers tend to test the “clean” path through their code.
- A human tester can quickly make mincemeat of the developer’s fairy world.
- Good QA testers are worth their weight in gold.
- I once worked with a guy who was incredibly skilled at finding the most obscure bugs.
- He could describe exactly how to replicate the problem, and he would dig into the log files for a better error report, and to get an indication of the location of the defect.
- Joel Spolsky wrote a great article on the Top Five (Wrong) Reasons You Don’t Have Testers—and why you shouldn’t put developers on this task. We’re just not that good at it.
http://www.joelonsoftware.com/articles/fog0000000067.html

B. Code reviews
- Code reviews and formal code inspections are incredibly effective at finding defects (studies show they are more effective at finding defects than developer testing, and cheaper too), and the peer pressure of knowing your code will be scrutinized helps ensure higher quality right off the bat.
- I still remember my first code review. I was doing the ArsDigita Boot Camp which was a 2-week course on building web applications.
- At the end of the first week, we had to walk through our code in front of the group and face questions from the instructor.
- It was incredibly nerve-wracking! But I worked hard to make the code as good as I could.
- This stresses the importance of what Robert L. Glass calls the “sociological aspects” of peer review.
- Reviewing code is a delicate activity. Remember to review the code…not the author.

C. Usability tests
- Another huge problem with developer tests is that they won’t tell you if your software sucks.
- You can have 1500% test coverage and no known defects and your software can still be an unusable mess.
- Jeff Atwood calls this the ultimate unit test failure:

    I often get frustrated with the depth of our obsession over things like code coverage. Unit testing and code coverage are good things. But perfectly executed code coverage doesn’t mean users will use your program. Or that it’s even worth using in the first place. When users can’t figure out how to use your app, when users pass over your app in favor of something easier or simpler to use, that’s the ultimate unit test failure. That’s the problem you should be trying to solve.
   (這段話, 道出了coverage test的弱點. coverage高並不代表你程式quality高, 可能是你程式沒有做太多error handling, 或是有些需求沒有寫到, 甚至也可能是你用的criteria太低[譬如你只用function coverage or statement coverage來看結果 ])

- Fortunately, usability tests are easy and cheap to run. (這點個人是有點持保留態度, 但也可能我不知usability test如何執行. 各位先進, 還請分享一下)
- Don’t Make Me Think is your Bible here (the chapters about usability testing are available online).
- For Tumblon, we’ve been conducting usability tests with screen recording software that costs $20.
- The problems we’ve found with usability tests have been amazing. It punctures your ego, while at the same time giving you the motivation to fix the problems.

那為什麼Unit Testing有用呢?

作者認為Unit testing 之所以有用, 是因為它讓我們思考我們所寫的code是否有問題, 是否有可以改進的地方.

作者還引用的Michael Feathers所寫的文章:The Flawed Theory Behind Unit Testing, 來佐證
http://michaelfeathers.typepad.com/michael_feathers_blog/2008/06/the-flawed-theo.html

    One very common theory about unit testing is that quality comes from removing the errors that your tests catch. Superficially, this makes sense….It’s a nice theory, but it’s wrong….

    In the software industry, we’ve been chasing quality for years. The interesting thing is there are a number of things that work. Design by Contract works. Test Driven Development works. So do Clean Room, code inspections and the use of higher-level languages.

    All of these techniques have been shown to increase quality. And, if we look closely we can see why: all of them force us to reflect on our code.

    That’s the magic, and it’s why unit testing works also. When you write unit tests, TDD-style or after your development, you scrutinize, you think, and often you prevent problems without even encountering a test failure.

So: adapt practices that make you think about your code; and supplement them with other defect detection techniques.

所以千萬不要是為了做事而做事, 而是要思考你做這事能幫助你什麼, 你為什麼要做這件事

既然Unit Testing是不夠的, 那我們為什麼還要RD只做這些事情呢?

做這認為可能的理由如下:
Most programmers can’t hire a QA person or conduct even a $50 usability test.
And perhaps most places don’t have a culture of code reviews.
But they can write tests. Unit tests! Specs! Mocks! Stubs! Integration tests! Fuzz tests!
也就是說這些事情是他們所能控制的, 所以他們只好一直做這些事情.

這聽起來是不是很諷刺, 可是這也是我們平時容易做的事: 只做容易做的, 或是能做的. 但不是做正確的, 或是重要的.

最後作者再次強調:
No single technique is effective at detecting all defects.
We need manual testing, peer reviews, usability testing and developer testing (and that’s just the start) if we want to produce high-quality software.

Resources
* Robert L. Glass, Facts and Fallacies of Software Engineering.
* Steve McConnell, Code Complete 2nd ed, Chapters 20-22.
* Steve Krug, Don’t Make Me Think.