How Service Virtualization can solve your Test Data Management problems


As  software becomes more complex and distributed, and handles more customers and transactions over time, it is also generating an exponential increase each year in resulting data. Some systems of record have become so large and unwieldy (petabytes or zettabytes even), that they can barely even be managed. You have dozens of data sources in a wide variety of storage containers. And the data problem is only getting worse. The term Big Data was coined to describe the massive amount of unstructured data being captured in consumer and transaction activity online. Data is a big, hairy constraint for every enterprise development effort.

Have your teams ever struggled to set up just the right scenarios across multiple systems, only to “burn” them all with a single test cycle? Have you seen issues with regulatory or privacy rules about exactly how customer data is used within development and testing cycles? Or found it difficult to re-create scenarios in test systems for those unique types of edge conditions that happen in production?

The schematic below represents development teams attempting to deliver and test a health care web application. Notice how very little of the data is “in-scope” where they can extract it directly. Most of the data they need comes from systems that are “out-of-scope” or not under the team’s control.


Test Data Management problems create huge manual effort and delays due to difficulty setting up complete enough data scenarios from upstream users and out-of-scope downstream systems that are resistant to “copying” into a local database.


One company has such a severe data problem that they set up a huge “midnight run” requiring 12 other teams to manipulate their own live systems manually, all inserting the test data at the same time in order to accommodate one test run. That’s a lot of overtime. One health care QA director reported: “We spend two hours running a test cycle, then we spend three full days resetting data across the systems.”

There’s an answer to this problem: TDM through Service Virtualization

The most obvious solution is the conventional practice of Test Data Management (TDM): extracting a subset of production data directly from all the involved systems into a local TDM database, and then importing that data into the nonproduction systems.

However, for a number of reasons, the traditional approach to TDM isn’t working:

  •  Fragile data: Applications change often — requiring frequent, precisely-timed extract, manipulate, and setup activities.
  •  “Burned” data: Live transactions often “burn” a carefully constructed set of test data upon use (your previously zero-balance customer now has a balance!), making the data unusable for that purpose again and requiring either re-import or very difficult, manual undoing of the changes made.
  •  Complexity: Heterogeneous sources — SQL, IMS, VSAM, Flat Files, XML, third-party service interfaces — vary widely, whereas most TDM solutions only deal with a subset of possible RDBMS data sources. Moreover, Big Data brings non-relational data sources to the mix.
  •  Security and regulations: Data security is a constant worry. Strict laws and industry standards govern the protection of private customer data (ID and bank account numbers, medical records, etc.) by development and test teams, as well as accountability standards for how that data is stored and shared.
  •  Labor- and cost-intensive: Many development shops report that 60 percent or more of test cycle time is spent exclusively on manual data configuration and maintenance activities.
  •  Difficult-to-reproduce scenarios: It’s hard to isolate and re-create specific input-and-response scenarios. Lack of realism limits the success of functional and performance testing.

Moreover,  for most development activities in a composite app world, most of the data you need exists in systems that are “out-of-scope” and not under your control, as in the example above.

Service Virtualization is a way to free your development teams from this burdensome constraint. So rather than trying to extract data directly from these out-of-scope sources, you can use SV to capture and simulate the behavior of these systems by responding with just enough appropriate data and dynamic behavior to “fool” your system under development into believing it is talking to the real thing. We call this virtual Test Data Management (or vTDM).

Using vTDM instead of real TDM seems too simple, but this is actually the healthiest way for your development teams to get stable, relevant test data they can rely on in a lightweight form. SV makes gathering just the data needed from downstream systems much easier by automating the capture of relevant scenarios, intelligently interpreting the kinds of data seen, and masking and manipulating that data as part of a Virtual Service.  VS-based data allows all your teams to always have on-demand access to relevant datasets for systems under test, and that data can be expanded upon to cover almost infinite valid data scenarios to support high- volume performance and regression testing needs.

Excerpted by permission from Service Virtualization: Reality is Overrated, by John Michelsen and Jason English, published by Apress.