Root Cause Analysis (RCA) is an approach used in software quality to identify the root causes of bugs or issues and address them instead of treating the symptoms. In this article, Mush Honda explains that RCA can be applied to end user feedback as well as software defects during software testing and provides some tips on how to apply RCA.
Author: Mush Honda, KMS Technology, http://www.kms-technology.com/
When a software defect is identified in the production or live environment, there’s always a rush to fix it and get things working as expected again. People are so focused on resolving the issue that they often forget about finding the cause. By applying Root Cause Analysis (RCA) to software defects, you can go beyond simply treating the symptoms and uncover the root of your problem. If you fail to address the root cause, there’s a good chance that you’ll get stuck in a recurring pattern of defects, fixing symptoms, but never finding the source.
A successful RCA will reveal why symptoms or issues are occurring, enabling you to formulate a mitigation strategy. It is an opportunity for the team to evolve, learn, and get better at delivering software. Fixing at the source can also alleviate other symptoms that you didn’t realize were linked and, ultimately, it results in an efficient delivery process with higher quality. But where and how should you apply it?
A tool for RCA: the fishbone or cause-and-effect diagram. Source: http://wikipedia.org/
Targeting business critical issues
When an issue that’s causing a workflow blockage or impacting revenue generation arises, you need a system in place to deal with it swiftly and effectively. You can apply a service level agreement (SLA) approach to RCA assessment. Whenever a defect of a certain severity is identified, you commit to an assessment and mitigation plan to deal with the cause, as well as fix the symptoms that have been identified in the short term.
You need to work on fixing the symptoms in parallel with your root cause analysis to minimize the negative impact. Commit to aggressive turnaround times, and aim to complete fixes and assessments within a few hours (certainly within 24 hours) of a high severity business-critical defect popping up. Having a distributed workforce with an offshore team can be really helpful here, as your teams can work round-the-clock.
Capturing data and planning a fix
The software, template, or format you use to capture pertinent data should be whatever works best for your team collectively. It could be an Excel spreadsheet, a Word document, or a specific tool. At a bare minimum you’ll need to think about the following:
- Problem statement – What was the symptom? How did the issue come to your attention? What are the business and financial impacts? What precisely occurred?
- Root cause – What was the root cause of the issue? How was it missed during software testing? Is there a gap between test and development where changes are not being assessed accurately? Can you identify a gap in your testing efforts?
- Mitigation plan – What are you doing to resolve the problem? How is the symptom being fixed? Could you improve communication or processes to avoid an issue like this from arising later? Have you done due diligence in assessment to ensure that any fix doesn’t actually cause another issue?
- General trends – Make sure that your issue is categorized, so that problems can be grouped and analyzed over time to show general trends. What is the most frequent type of issue that crops up? What could be done, regarding overall strategy, to minimize those issues? Remember that everyone should be working collectively to identify ideas that will improve the way you deliver software.
Stirring end user feedback into the mix
The importance of end user feedback is widely accepted, but it isn’t always included in the RCA approach. If you are collecting a lot of data about how users interact with your software, asking them to do surveys, and report further details on the issues and errors they encounter, then you should stir it into the mix as well.
Collate this user feedback to identify common issues. The IT/Customer support team can review this incoming data and ensure that the immediate symptoms were addressed, while the QA team gathers feedback and assesses the issues, just like you would in a normal RCA. The idea is to provide useful input to the business team and stakeholders on how to prevent the same issues from cropping up repeatedly.
It is valuable feedback for the stakeholders to understand how their application is being used in the real world, but it can also prove extremely useful for testers, as it helps them understand the domain context in greater detail.
A holistic view
If the user feedback is grouped together with all the other RCAs based on defects, then you start to form a clear overview of what’s trending in terms of business impact. It is a route to better coverage, and it informs the future approach of the whole team, giving them valuable insights. By employing Root Cause Analysis this way, you can laser-focus on what will provide the greatest business benefits and boost the quality of your software for the end user.
About the Author
Mush Honda is Vice President of Testing for KMS Technology, a provider of IT services across the software development lifecycle with offices in Atlanta, GA and Ho Chi Minh City, Vietnam. He was previously a tester at Ernst & Young, Nexidia, Colibrium Partners and Connecture. KMS services include application management, testing, support, professional services and staff augmentation.
Hi, Mush!
Awesome article!
Root Cause Analysis is a critical part of bug fixing, indeed. Actually, it has to be a part of QA management.