Researchers in Australia have been working with an old data analysis technique from genomics to create a new technique called Opaque Data Processing (ODP) to improve the ability to automatically create virtual services in the absence of explicit documentation or expert knowledge. We caught up with Miao Du, a PhD Candidate at Swinburne University of Technology in Australia, who pioneered research into ODP, and Steve Versteeg, Vice President of Research for CA Technologies in Australia.
Service Virtualization: What do you see as the primary limits and challenges around making accurate virtual services in the absence of expert knowledge, explicit documentation, or the need to know the message structure?
Miao Du: Most of the current techniques for service virtualization are based on manual techniques to define the interaction models. Someone has to know the message structure and how the messages can be exchanged between services and their interaction behaviors. This can be done accurately when there is expert knowledge available about the service. This research uses AI techniques like the Needleman-Wunsch algorithm, data clustering, entropy analysis, and multiple sequence alignment.
Before ODP, there were two approaches for creating services. One is to have a system expert manually define the virtual service model using their knowledge, and there are commercial tools for this. The other approach is to record the interaction of the system under test and the service you want to virtualize to create a service image with a simple set of rules with information about how to play back the virtual service with some modifications. That works well, but in order to have simple rules, there is an assumption that the system understands the message structure. There has always been an assumption that the system understands the message structure so we can know the rules to reason about. This new technique that uses ODP is good for rare services.
In what ways do these limits affect the capabilities as part of the development and testing side required for delivering defect free software?
Steve Versteeg: Service virtualization has been a major breakthrough for development and testing. In the modern enterprise, there are hundreds if not thousands of services running and talking to each other. When you upgrade or replace systems, it is risky you don’t know what the cascade effects will be. There was a recent case of a company that ran a tunnel toll system in Australia, and when they upgraded their billing system, there was a cascade of effects that had an impact on their safety systems that shut down the tunnel. Because it is so risky companies, are cautious about upgrading. Big companies will have an exact replica of their production environment, but those are expensive and difficult for the average developer to access.
Service virtualization makes a recording of the interactions between applications and other services they rely on, so you have a very good model that you can access all of the time. The application developer can use that as part of the application development process so they can test applications as though they were interacting with the real environment. It really is a massive breakthrough in terms of mitigating unexpected secondary effects.
But there have been cases where you want to virtualize a service for an application that has to talk to legacy or proprietary systems and the documentation is not available. It can either take a lot of time to reverse engineer the message structure, or the project can fail if the team decides that it is too hard.
What is opaque data processing and how can it address these challenges now?
Miao Du: In essence, the technique helps solve the problem of formulating a good response to an unknown request if nothing is known about the message structure. ODP can be adapted to solve part of the problem. The algorithm has been used in genomic sequencing since the 1970s. We realized that the technique could detect byte similarities without needing to know what the bytes mean.
ODP-like techniques are used in genome sequencing to understand the genome structure even when explicit information is not known about genes should be aligned. It makes it possible to solve a larger problem of how a full genetic sequence should be aligned into a series of smaller problems involving lining up smaller genetic fragments. In the case of service virtualization, this algorithm is used to record requests and then find the most similar coded requests in a way that eliminates the need for expert knowledge or documentation about the service.
We have adapted this approach to get better results for the domain in service virtualization. Working with the weighting and entropies of different sections of the messages was one of the adaptations we made to improve the accuracy.
What are the types of use cases and applications that will benefit the most from the use of opaque data processing combined with service virtualization?
Steve Versteeg: This is a big deal for legacy, proprietary, and domain specific protocols and custom applications that are unusual or that are not supported out of the box by the tool you are using.
This could also have some applicability to improve service models in response to changes in the operations data. It should make it easier to adapt your service models over time. Since we don’t need the experts to create the model, we have less need for expert knowledge to maintain the models. You could potentially have a much more continuous and adaptive virtual service models.
What is the status of using ODP techniques for service virtualization?
Steve Versteeg: It is included CA Service Virtualization 8.0 that was just released. We are continuing to improve on it as well.
What kind of improvements are you making to ODP?
Steve Versteeg: We have recently refined the technique so that we are basically starting to reverse engineer the message structure and apply data clustering to both improve the accuracy of the response we send back, and also the efficiency to make it faster and use less memory.
We are working with statistical techniques to improve the classification of traces. This can help to accelerate the matching time.
Another improvement is to summarize the interaction groups using another bioinformatics technique called Multiple Sequence Alignment. This technique involves aligning all of the messages and we observe the position to use the most frequent occurrence in bytes to represent in order to draw the interaction faster.
Another current limitation in ODP is that currently our reasoning engine is stateless and cannot take into account the history of the service. Enabling the ability to create virtual services with state is an area we plan to explore in the future.
We think there are another hundred things we can do to improve it as well. I think that what Miao Du has come up with is a real breakthrough and we are optimistic we can improve on this technique.