Non-functional test oracle problem for continuous delivery

Our paper “Verdict Machinery: On the Need to Automatically Make Sense of Test Results” has been accepted at ISSTA 2016. It is based on Mikael’s and Emre’s Master thesis as well as on a great collaboration with Ericsson AB (see author list below).

Personally, I found this a particular challenging, yet rewarding piece of work, since it brings together aspects of non-functional system testing with process topics such as continuous delivery and quality requirements engineering. Thus, clearly articulating the problem has already been a challenge. But let me give an example:


Example of performance goal and performance over a number of deliveries

The Figure shows a fictive example of a performance goal (e.g. number of simultaneous users supported – y-axis). The actual performance (red) changes with each of the 12 deliveries in the example (x-axis). Traditionally, a performance test would fail on delivery number 10, since performance now is below the performance goal (blue). However, this might be unfair and economically wrong.

  • It is unfair, since delivery 5 was actually far more problematic and if delivery 10 was delivered a bit earlier, it would have been accepted.
  • It might be economically wrong, since this delivery probably included good value for the customer, and anticipating performance improving change (delivery 12), a customer might have accepted a temporal breach of the performance goal in exchange for it.

A good verdict machinery in this fictive case should have rejected delivery 5 because of its unusually high performance degradation. Regardless of the concrete mechanism, without an automatic verdict, a large agile organization would need to agree on a case-by-case basis whether a delivery can be accepted or not. This would significantly lengthen time-to-market of all deliveries.

Title: Verdict Machinery: On the Need to Automatically Make Sense of Test Results

Abstract: Along with technological developments and increasing com- petition there is a major incentive for companies to produce and market high quality products before their competitors. In order to conquer a bigger portion of the market share, companies have to ensure the quality of the product in a shorter time frame. To accomplish this task companies try to automate their test processes as much as possible. It is critical to investigate and understand the problems that oc- cur during different stages of test automation processes. In this paper we report on a case study on automatic analy- sis of non-functional test results. We discuss challenges in the face of continuous integration and deployment and pro- vide improvement suggestions based on interviews at a large company in Sweden. The key contributions of this work are filling the knowledge gap in research about performance regression test analysis automation and providing warning signs and a road map for the industry.

Keywords: Non-Functional Testing Oracle; Verdict System; Performance regression test analysis; Automation

Reference: Fagerström, M.; Ismail, E. E.; Liebel, G.; Guliani, R.; Larsson, F.; Nordling, K.; Knauss, E. & Pelliccione, P.: Verdict Machinery: On the need to automatically make sense of test results. In: Proceedings of International Symposium on Software Testing and Analysis (ISSTA ’16), Saarbrücken, Germany, 2016

Pre-print: FIL+2016-ISSTA-Verdict_amchine-camera-ready.pdf

