How Pairwise Testing Improves Test Efficiency

One goal of software testers is identifying serious defects before they escape into production. Even with software test automation techniques like service virtualization, it’s often not practical or even possible to test every combination of data and paths through the application. I caught up with Lee Copeland, author of A Practitioner’s Guide to Software Test Design, Talent Scout at Software Quality Engineering, and an expert on pairwise testing to explain how pairwise testing can increase test automation efficiency.

Service Virtualization: What is the most fundamental problem in software testing?

Lee Copeland: When you consider the huge number of possible data combination and the number of execution paths through a system, we have far more combinations than we could ever test. THE fundamental problem in testing is choosing a reasonable-sized subset of all the possible test cases that will find a substantial percentage of the defects, but that can be performed within the limited time and budget available.

All of the black box test design techniques like equivalence class and boundary value testing are minimization techniques that try to minimize the number of test cases we need to create and execute. However, even with those, there are often far too many test cases. So the question is, “How can we find serious defects most efficiently?” We could choose randomly but this is not likely to be efficient. We could also be a little bit more organized and use a risk based approach to identify risky test cases and start testing those. Another scheme is the pairwise approach.

What exactly is pairwise testing?

With pairwise testing we test all of the pair combinations rather than testing all of the combinations. For example, let’s say we had to test a website that is supposed to run on a number of different browsers and different versions of those browsers. In addition, they need to run on different operating systems which may be at different revision levels. Also there may be plug-ins on those browsers at different revision levels as well. This leads to a huge combination which could be so large that it is not practical to test all these combinations. The pairwise approach is to just test pairs―every browser and operating system at least once, and then every browser and plug-in at least once, etc. until all the pairs are tested.

While that sounds like a large number, it turns out that it isn’t. We have found we can run 10-20% of the possible test cases and still detect 70-90% of the defects. That is the attractiveness of pairwise testing. We run smaller subsets of test cases and still find a much larger than expected percentage of the defects.

Pairwise testing can find a much higher percentage of defects, but there are no guarantees. I have seen people get excited about pairwise testing without really evaluating it. I would recommend that tester evaluate it as a possible approach, give it a try, and see if it works for their organization. If it does, use it, but there is no underlying software physics that says it has to work in every situation.

How can a tester identify useful pair combinations to test?

There are a number of different ways to generate these pairs. The first scheme involves mapping the potential test cases onto orthogonal arrays, which are curious mathematical oddities. They are two dimensional arrays of numbers that have the property that all of the pairs occur in the orthogonal array. If we could map the testing problem onto the array, what sticks will be a set of test cases with this pairwise property. Orthogonal arrays have been around for hundreds of years, and have been used for testing purposes for the last fifty.

What kinds of tools can help to generate these pairwise test cases?

One of my favorite free tools is PICT from Microsoft, which generates all of the pair combinations. James Bach also has a tool on his website Satisfice.com called ALLPAIRS that will generate the pairs for you. A commercial tool called rdExpert uses the orthogonal array scheme to generate the pairs.

What are the limits in these tools?

I would not say there are limits in the tools. The limits are in the pairwise methodology itself. First of all, pairwise testing is not going to be effective if you choose the wrong variables and their parameters. You have to select the variables and parameters that actually affect whether the system will pass or fail. If you pick text color or text font as a variable for instance, that may not have anything to do with whether the website works or not. If you pick the wrong things, pairwise testing will not help.

Another problem with pairwise and black box testing techniques in general is that they don’t take risk into account. They treat all of the pairs as equal in value in terms of risk, chance of failure, and in loss to the organization. As testers, we know this is not the case. Some combinations will be more risky than others. They will be more prone to failure, or their failure will cause a bigger hole in the ground.

Since none of these techniques incorporate risk, the tester has to do this. An all pairs method might not select what you know are high risk combinations. Even though the tool or algorithm did not pick these, you as a tester have to add those into your test cases based on experience talking to users or customers that these cases are high risk and absolutely need to be tested.

What are the most common misconceptions of pairwise testing?

I have seen people get so excited that they throw all of the pairs into the pot, that have nothing to do with whether the system will pass or fail. The misconception would be that pairwise testing is the answer to all of your testing problems. But at the end of the day, it is a scheme for minimizing the number of test cases required to identify defects.

What are the best practices for implementing a pairwise testing program as part of the software development lifecycle?

The first thing is that you have to be in an environment where you have lots of combinations of data or program paths or both of those. There are some systems that are not like that, such as pipeline systems used in processing timecard data for payroll. I have seen people force pairwise testing onto those applications, and that does not work. It is where we have so many combinations that we cannot test them all or do it in a reasonable amounts of time that you want to choose a subset, and pairwise helps with that.