Reasons to Repeat Tests @ David Ko的學習之旅

Reasons to Repeat Tests
http://www.satisfice.com/repeatable.shtml
by James Bach

找bug就好像再地雷區中掃雷一樣, 你很難保證所有地雷都被找到, 除非你把每寸土地都踏過. 所以測試也是一樣, 基本上你不可能把每個執行路徑都跑過; 即使每個執行路徑都跑過, 你也不可能run過所有input data. 只能說大概不部分的區域, 我們都檢查過.

所以測試要做的好, 很重要的部份就是要增加涵蓋區域, 也就是增加樣本的空間, 而不是單單的把之前的東西, 一直反覆執行而已. 這樣是無法發現新的問題.

但是你覺得完全相同的重複測試是可能的嗎?意思是說即使你在執行一次相同的test cases, 但是在執行過程中, 你加減會這裡試試, 那裏試試, 不可能會有完全相同的測試動作, 流程和input data. 因此這也是說明了, 為什麼有人再執行一次, 或是執行別人執行過的test cases, 仍然會找出不同的bugs出來. 因為人不是機器, 不會每次所有的動作都一樣. 所以在regression test中, 通常是絕大部分的行為是一樣, 而有少部分的行為是會不同的.

像是RD 常說在我這台機器不會發生啊? 或是我check-in之前已經測過了? 或者有人會怪QA或合連這些簡單的組合都沒測過. 其實這些都是相同的道理, 你認為好像是相同的流程, 但是中間就是有一些地方不太一樣, 可能是多加或是少一些步驟, 或者input data不完全相同. 所以下次在遇到這種狀況, 不用再相互牽拖, 就是有一些地方不太一樣.

從技術的角度來看, 作者認為有以下理由要重複測試
1. Recharge:
(1) if there is a substantial probability of a new problem or
(2) a recurring old problem that would be caught by a particular existing test, or
(3) if an old test is applied to a new code base.
(4) re-running a test to verify a fix, or
(5) repeating a test on successively earlier builds as you try to discover when a particular problem or behavior was introduced.
(6) running an old test on the same software that is running on a new O/S.

In other words, a tired old test can be "recharged" by changes to the technology under test.

2. Intermittence: (周期性)
(1) if you suspect that the discovery of a bug is not guaranteed by one correct run of a test, perhaps due to important variables involved that you can't control in your tests.

Performing a test that is, to you, exactly the same as a test you've performed before, may result in discovery of a bug that was always there but not revealed until the uncontrolled variables line up in a certain way.

3. Retry:
(1) if you aren't sure that the test was run correctly the other time(s) it was performed.
(2) A variant of this is having several testers follow the same instructions and check to see that they all get the same result.

4. Mutation: (變種)
(1) If you are changing an important part of the test while keeping another part constant.
(2) Even though you are repeating some elements of the test, the test as a whole is new, and may reveal new behavior.
(3) I mutate a test because although I have covered something before, I haven't yet covered it well enough.
(4) A common form of mutation is to operate the product the same way while using different data.
(5) The key difference between mutating a test and intermittence or retry is that with mutation the change is directly under your control.
(6) Mutation is intentional, intermittence results from incidental factors, and you retry a test because of accidental factors.

5. Benchmark:
(1) if the repeated tests comprise a performance standard that gets its value by comparison with previous executions of the same exact tests.
(2) When historical test data is used as an oracle, then you must take care that the tests you perform are comparable to the historical data.
(3) Holding tests constant may not be the only way to make results comparable, but it might be the best choice available.

從business的角度來看, 作者認為有以下理由要重複測試
6. Inexpensive:
(1) if they have some value and are sufficiently inexpensive compared to the cost of new and different tests.
(2) These tests may not be enough to justify confidence in the product, however.

7. Importance:
(1) If a problem that could be discovered by those tests is likely to have substantially more importance than problems detectable by other tests.
(2) The distribution of the importance of product behavior is not necessarily uniform.
Sometimes a particular problem may be considered intolerable just because it's already impacted an important user once (a "never let it happen again" situation).
(3) This doesn't necessarily mean that you must run the same exact test, just something that is sufficiently similar to catch the problem (see Mutation).
(4) Be careful not to confuse the importance of a problem with the importance of a test.
(5) A test might be important for many reasons, even if the problems it detects are not critical ones.
(6) Also, don't make the mistake of spending so much effort on one test that looks for an important bug that you neglect other tests that might be just as good or better at finding that kind of problem.

8. Enough:
(1) if the tests you repeat represent the only tests that seem worth doing.
(2) However, we may introduce variation because we don't know which tests truly are worth doing, or we are unable to achieve enoughness via repeated tests.

9. Mandated:
(1) If, due to contract, management edict, or regulation, you are forced to run the same exact tests.
(2) However, even in these situations, it is often not necessary that the mandated tests be the only tests you perform. You may be able to run new tests without violating the mandate.

10. Indifference/Avoidance:
(1) If the "tests" are being run for some reason other than finding bugs, such as for training purposes, demo purposes (such as an acceptance test that you desperately hope will pass when the customer is watching), or to put the system into a certain state.
(2) If one of your goals in running a test is to avoid bugs, then the principal argument for variation disappears.