Thursday 6 December 2012

Managing Performance Troubleshooting Projects



Software performance troubleshooting can be defined and interpreted in several ways - for the sake of this discussions it refers to dealing with post-production performance related problems. A related term is performance tuning but I consider that to be improvement over existing performance metrics. Performance troubleshooting projects tend to be guerrilla assignments where one has to act quickly and with precision. My experience is that managing such projects requires a slightly different mindset compared to a normal performance testing project.

First things first: Why do systems have performance related issues once they are in production? There can be many causes; unexpectedly high surges of traffic, not enough thought given to non-functional requirements at the system architecture level, poorly specified non-functional requirements, incorrect assumptions about how test environment scales to production, dependency on an external system etc. A logical question to ask is, weren't all these things accounted for during the normal performance testing phase before going to production? That's a fair point, but it needs to be understood that performance testing itself is performed under extremely tight timelines. Much like functional testing, it is impossible to test each and every conceivable scenario in performance testing also. Furthermore, no matter how much one desires, the testing and production environments are very seldom identical - this alone can be a source of many issues. The fact is that production systems, with their live data and real time traffic, are so complex that simulating them with 100% accuracy in the test environment is almost impossible - hence the difficulty in predicting production issues.

In these cases, obviously prevention is better than cure - but as I have tried to argue above, some things cannot be prevented. When performance testing for such projects has been completed (to whatever extent possible), it is critically important that potential production issues are highlighted as risks in the final document. This will go a long way towards understanding the situation when its time to perform troubleshooting tests. The documented risks should be the starting point of the troubleshooting exercise. The tests to run should be few and need to be planned very carefully so that the problem is identified as quickly as possible. The tests should be run in the production environment and with live data. This in itself is a very tricky proposition - what should be the load on the environment; if its too much, the system performs slower and affects live users in adverse ways thereby potentially losing business; if its too little, simulating the actual problem might be difficult. Can certain components be switched off so as to isolate the problem - if so for how long? If additional data flows are needed, where do they come from? Then there are legal/contractual issues to consider, such as possible testing of any external systems (if allowed, then who does it?). All these are judgement calls and need to be made accurately and quickly (often in real time), both by the performance test specialist in charge as well as the project team.

Performance troubleshooting is performed by experienced specialists who have done it before. The time frame is obviously very short - business loses money every minute that the problem persists. To avoid cluttering, there is usually just one person who handles the hands on as well as performance test management duties. The project team treats these projects with a great deal of urgency - tasks to support the performance test specialist are expedited, there is a general willingness on their part to let the specialist take the lead and guide the effort forward. This implies added responsibility on the specialist to try and achieve the desired goal with the given set of constraints.

Performance troubleshooting is an expensive exercise since it happens very late into the product cycle. Unfortunately, systems that have had production issues once are at risk of being afflicted with them again - particularly if the remedies to the first issue involve changes to shared code repositories. It is therefore very important that performance test managers, when they do have the opportunity, insist on pre-production performance testing to be as thorough as possible. If the exhibited performance is not satisfactory or if enough testing has not been performed (for whatever reason), the clear recommendation should be to not risk going live or at the very least, the risks of doing so need to be communicated in writing and must be fully understood.

Thursday 20 September 2012

Which Comes First - Performance Testing Or Test Automation?


At the very outset of this post, let me say that I am not a test automation specialist at all. I have done test automation for a couple of products but performance testing is much more my forte. I have however, been on projects where both these exercises have been performed on a particular release. My aim here is to put a road map around how to go about doing this.

First up, some views about test automation. My opinion is that it is not testing at all - its development. The tests that are automated are the functional tests - test automation just does the same things through an automated process. To that end, test automation does not tell us something about the system that functional testing does not. Sure, it tells us the same things a lot quicker and with greater efficiency and that saves a lot of precious time for other activities - but nothing more. Unlike performance testing, which is highly recommended and could cost management a lot if not done, test automation is not critical to the health of the system. In my experience, it has been somewhat rare for projects to do both performance testing and test automation - usually its one or the other (performance testing tends to win out), but the discussion here assumes both are performed.

In the first release, there will not be a regression test library of automated tests to run, so test automation needs to proceed hand in hand with functional testing. Test cases that are candidates for automation need to be identified very early on. As functional testing progresses, regression testing of tested functionality should be done through the test automation scripts. The way this helps performance testing is that the system is likely to be in a stable state for scripting, given that functional testing has reduced the functional errors to a certain degree. Furthermore, the scripts for some automation test cases could be exported to the performance testing tool for performance testing (this is possible at least in HP's suite) - not every automation test case will be tested for performance, so one has to choose. By the time the second release comes around, there is a library of automated regression test cases that should be run and expanded (as part of the functionality in the new release) as often as needed. Performance testing stands to benefit the same way as before.

One of the pitfalls to avoid when following this road map is to fall into the temptation of reducing the scope of (or doing away with entirely) performance testing if test automation execution reports tell us that all is well and dandy with the system. This can happen when pressure to release the product on schedule increases. It must be remembered that no matter how well functional testing went (in terms of removing defects) and how quickly regression testing was completed due to test automation, performance testing is about testing the non-functional attributes of the system. It's aims are entirely different. Functional testing and test automation results will provide information about what works in the system for a single user - it says nothing about how well it would work when under load and/or when system resources are stressed.

A lot of other issues such as choice of test cases to automate, which ones of those to use for performance testing etc. are out of scope for this discussion and will be left for another day. In general, test automation and performance testing offer value in different ways - it is up to the test manager and the performance test manager to figure out ways to get the most out of these, as well as how one benefits from the other.


Thursday 23 August 2012

No Perfectionism, Please!!!

Like all variants of software testing, performance testing too is eventually about better product quality. If you've taken on performance testing projects, you know the feeling when you first get your hands on the system. You want it to quickly do whatever it is supposed to do (sometimes - initially at least - with little regard for the complexities associated behind the scenes with different user actions). Ironically, you start to feel better when the system isn't so fast after all - the sinister intent here being that you then get to try out a million things to see how they impact performance.

All this lends itself to an attitude heading towards perfection - and that is a mistake bordering on blasphemy in magnitude. Anyone worth their salt in software testing will tell you that no amount of testing is enough (enough, meaning that any further testing will not improve quality). This is a noble thought to have but we (testers) live in a world of shrinking product cycle times, budget constraints and developers (those evil people who take up 80% of the product resources). Performance testing, lo and behold, is an even farther afterthought. System testers have to test the system for functionality, bugs need to be fixed and retested (which opens up another pandora's box of issues). Finally performance testers are called in to do their thing with the "go-live" date firmly stamped on their screens.

In such circumstances the perfectionist will have the hardest time earning his bread. It is critically important to make best use of the little time and resources that are available to performance testing. Non-functional requirements need to be clearly defined by the business team and communicated (by God, when has this actually happened in reality?). Performance test scenarios need to be realistic and should be aimed at finding out as much as possible about the system. If initial tests indicate performance problems, then executing the exact same test with more users with a view to increase the load on the system, makes little sense. As performance test managers, a judgement call needs to be made about what can realistically be achieved in the time available - scope, scenarios, expectations then need to be adjusted accordingly.

More often than not, you will see that the system does not conform to the agreed set of non-functional requirements. This is fine and should be communicated back to business along with the facts about what was discovered about system performance (business, by the way, are not likely to take this very well). As a performance test specialist/manager, this does greater service to the project team than simply executing some standard tests with the production loads and showing them the results - these tests have value after lower loads have shown no significant problems with performance.

At the very heart of the matter, as testers (of whatever variety), we are there to point at problems in the system. The more problems we can find, the more we contribute towards the quality of the end product. However, it is not possible to find out every problem - this is an uncomfortable truth, but a truth nonetheless. Moreover, testers are not there to fix problems - the developers do that and what is or is not important enough to be fixed is entirely out of our hands. What we as performance testers can and must do, is to relay as much information as possible about system performance and outline the risks (if any) that we feel the system will run into if it were to go to production with this level of performance. Having a perfectionist's attitude makes it hard to swallow that pill.