Ranking bugs

I recently ran across a well-intentioned request on StackExchange’s beta Software Quality Assurance and Testing group on ranking defects. Something to do with locating a good bug matrix to use.

This would have been my response, had StackExchange not decided my answer looks like spam:

It appears the option you found is very similar to the old idea of Risk Prioritization Numbers. It’s an artifact of performing Failure Mode and Effects Analysis or FMEA. People have created several variations of this over the years and attempted to apply it to software development activities like testing. The problem I have with it when used in this context is its subjectivity and, as a result, failure to faithfully relate the effect on the end user.

Subjectivity #

What is severe or highly likely for one customer may have no impact upon another. Of course there are exceptions … your software kills someone every time they start it up. Yes, we can probably give that a “value”. But you’ll likely find your value is only “true all the time” for the most and least severe/likely defects. Everything in between can end up being a guessing game, something that for the majority of defects you find will be subjective. You are simply ranking the defects with ordinal numbers, based on some value system not created by your actual end users.

This worked better in the context it was originally designed for - military and other mission critical systems developed by agencies like NASA that had a low level of fault tolerance anyway. In that context, ranking is less important as defect removal is taken more seriously considering a failure could cost billions of dollars or cause the death of one or more people … or both.

But in the desktop, web, and mobile software business today where developers generate more non-mission critical bugs than they do features, the subjectivity can hurt when trying to use it to make decisions about what to fix first and what not to fix at all. We have a tendency to get lost in that murky middle part.

Another problem is the somewhat ridiculous math I have sometimes seen associated with it.

For example, if two runners compete in two races and runner A places 1st in both and runner B places 2nd in both, can you multiply those placements to give more meaning to them? For runner A, her multiplied value is “1”, while runner B is rewarded a “4”. Does that mean runner B is 4 times slower than runner A? It’s misleading, especially when runner B was 10 seconds behind in the first race and lost by a hair in the second. The “quality value” between them should be closing, but the math here tells a different story, doesn’t it? After the first race you have a more accurate story: runner B is slower than runner A. After the second race, we are led to assume ridiculous notions like 1) runner A is getting faster and runner B has little to no improvement or 2) that runner B is actually getting slower. Neither of which are the true story in this case. Yet, this is exactly how we are ranking defects.

How it affects the end user #

The end value calculated in this process may somewhat accurately tell you that one bug is more important than another one, but it cannot tell you by how much. And there’s a bigger problem.

If you go this route, you will end up with columns full of aggregations of defects according to their “value”. And the ranges for those aggregations?
They’re guesses more often than not.

This is a dangerous path. Victims of persecution and genocide are dehumanized by their tormentors. It’s a common tactic used to make the victims individually unimportant in the minds of those partaking or witnessing (and allowing) such torment. Without individual importance, it is easy to justify persecuting or eliminating such victims altogether, which is why it is a common technique used in such tragedies. The same thing happens when aggregating your defects into rankings. The goal was to indicate that a large or small number of defects with high or low rankings somehow tells us the overall quality of the product. However, most bugs fall between those extremes. As such, they lose their individual importance and are thus easily dismissed as unimportant to the customer and the business, because they’ve become “just another number.” So unless that number is one of the biggest or the smallest, it is very difficult to truly understand its value without considering the defect’s individual importance.

Tell a story instead #

In my experience, the most effective method of helping the business understand the actual quality of a product is to break it down into its most important quality characteristics and speak to them. Talk to your stakeholders, marketing/sales staff, support staff, customers, whoever, and find out from them what are the most important or top “N” quality characteristics of your product. Have them choose out of a good list like this one:

http://thetesteye.com/posters/TheTestEye_SoftwareQualityCharacteristics.pdf

Then tell a story to each important characteristic, giving examples of defects that impact that characteristic, and as a result, impact your customer. That is a story about what did fail, and possibly what might fail, in the context of how they might matter to the customer and the business. I like using a multi part testing story that leans on the previous ideas mentioned, as well as on both Bach’s approach to writing a test report in his RST classes as well as typical military-style documentation.

It’s not easy work. But it’s eye opening to everyone in the business if done well. Numbers can be ignored, but a good quality story spreads like a virus.

Resources:

 
1
Kudos
 
1
Kudos

Now read this

A simple Ruby webserver for your testing infrastructure

On occasion, I run into the need to host some files on a web server to perform some testing. Whether it’s testing an embedded solution on smart, multi-function printers, serving files for mobile testing or facilitating some piece of... Continue →