The number of people you need to observe in order to identify the primary problems with your website or app depends to a significant extent on the quality of your website or app. If your app is of very low quality and an immature environment, then very few people are required — typically no more than eight — to identify the key findings (I would define a key finding as a pattern that has been observed three or more times in a group of no more than 20 people).

If you have an inefficiently managed digital environment, then actually observing large quantities of people can be counterproductive. This is because as you observe more people, you start locating more minor problems. I have found that if you want to see action taken, you should not have more than three key findings and recommendations. If you report 20 findings and recommendations, then many teams will simply throw up their arms or just politely ignore you because there’s too much to do.

If you observe six people and locate a problem three times then that problem is affecting 50% of your sample. If you observe 15 people and you see something three times, then that’s affecting 20% of your sample. If it’s 100 people, then observing the same problem three times represents only 3%. If you decide that a finding is three or more occurrences of the same thing, then you will have a considerably higher number of findings and recommendations with 100 people than with 15. Furthermore, as mentioned previously, lots of recommendations may make for a big report but they rarely lead to genuine progress and improvement. Because it’s not enough to merely make a recommendation; you also need to understand the capacity and motivation of the people to make a change.

Anyways, in a typical environment, there are no more than three big things that if fixed, would make things significantly better. Fix those things and you discover that many other problems fade away.

What got me thinking about this was reading an informative article by Roy Ballantine, where he does a statistical analysis on the probability of different types of problems occurring, depending on the sample size. Roy showed that at 14 people, a problem that would affect 20% or more of the population has a 55% chance of occurring. However, a problem that only affects 5% of the population has only a 3% chance of occurring. At 30 people, a 20% problem has an almost 100% chance of occurring. However, a 5% problem has an almost 20% chance of occurring.

If you have a mature environment and you have solved the major 20% problems, then you will need much more people to identify the smaller, infrequent usability issues. However, if you haven’t solved the major issues for your customers, then observing lots and lots of people could flood you with data. This can lead to data paralysis and could even make things worse by solving minor problems whose solutions actually worsen the major problems.