In-house usability tests: the complete guide

I think all UX experts agree: Usability testing is the method that is the quickest way to gain valuable knowledge for any project – whether it’s an app, website or physical product. You just put someone in front of the application, give them a task and watch as they (hopefully) solve it. This article is intended to give you a comprehensive overview of what you have to consider to establish your tests in your company successfully. These tips are also of interest to everyone who conducts tests for customers in UX agencies

When I did the first tests 17 years ago, I still had to do a lot of convincing and first had to explain what usability is. Today, however, every decision-maker knows that usability is central to the success of every product. And also that usability tests are crucial for this.

In my experience, anyone who has not done usability tests themselves has one of two settings these days:

It’s way too complicated for us.
Everything is straightforward, everyone can do that.

Both attitudes are partly correct, partly wrong. You will find out why below.

Basics for usability testsThe story of usability – from punch cards to mainframes and PCs to smart wear and the Internet of Things

Usability has only been used as a term since the 1980s. But how people deal with machines was, of course, a topic before. At the latest with the advent of huge devices such as steam locomotives, aeroplanes or power plants and several terrible accidents following operating errors, it became clear that people do not always use devices as their inventors imagine.

Devices like this huge spinning machine from 1912 represented a high risk of injury for the workers – every move had to be perfect when operating them.

In the 1970s, when mainframes and punch cards were still in use, some people around the world were concerned with human-computer interaction (HCI for short, “human computer interaction”). With the advent of the personal computer, usability became more and more critical. It was no longer just a few, well-trained technicians and engineers who dealt with computers, but an ever-expanding group of users.

The first user tests had already taken place in 1947 – when the keypad for telephones was developed at Bell Laboratories and is still in use today. And usability tests have been used more and more since the 1980s. And with every new technical device that we surround ourselves with, they become more critical. Virtual assistants, smart wear and the Internet of Things will not change that – on the contrary.

You can find an excellent history of usability from colleague Jeff Sauro: A brief history of usability.

Usability definition – what is usability anyway?

Almost everyone translates usability as “user friendliness”. That’s not entirely correct; the correct translation is “usability”. But that sounds so cumbersome that hardly anyone uses the term – and the distinction is also academic, so that’s okay. If you want to have the terms defined correctly, Wikipedia will help.

Usability means that something

effectively,
efficient and
satisfactory

Can be operated.

That means: I can achieve my goal with the application (effectively). This is also possible with the least possible effort (efficiently). And the action itself, as well as a result, are excellent (satisfactory) for me.

Scientific treatises can be written on the details of the definition, but this is not necessary for everyday life. To put it casually, usability is simply that which makes dealing with something easy and pleasant.

Differences between usability and user experience?

The distinction between usability and user experience (UX) is another thing. This works in science, but hardly in practice. Because when it comes to usability, I’m interested in whether the user was satisfied. With UX, we go one step further: We also examine what the user did before interacting with our product, what he does afterwards whether he enjoyed the whole “experience”.

In the usability test today, you usually also consider UX aspects. Most of them continue to talk about usability tests.

Use of usability tests

Usability tests are about leaving your own opinion aside and watching how other people use an application. You will quickly notice what works and what doesn’t. You can see which problems users have and what they immediately understand. You may also see approaches or purposes that surprise you.

In any case, every user test will help you because you get away from gut feeling towards empirical foundations for improving the application. And you get away from discussions in the team based on assumptions about which solution is better, which arrangement is more logical or what is generally “better received”.

Limits of usability tests

As many advantages as usability tests have – there are still a few questions for which you are better served with other methods.

For example, when it comes to small graphic details or formulations. They usually have so little influence on usability that you need a vast number of participants to find out differences. A / B tests, in which you test two design variants live on the website, are the better choice here. You can examine hundreds or thousands of users – an unrealistically high number for usability tests.

It is also problematic if there is still nothing that you can test. If, for example, there are only a few mockups for an application, i.e. graphic designs, then you can try to build a click dummy with them. With a program like Axure or Invision, you add areas to the graphics that jump to the next page with a mouse click or other user interaction. Complicated menus, product filters or interactive maps can only be implemented with it if you invest a lot of time.

So maybe it’s better in such a case to have a focus group where users can discuss the designs.

Or, you can use another highly recommended method, the five-second test. It works as one would expect: You show a user a screen for five seconds and then ask him what he remembers. Which elements he perceived. What he would click/tap. That works surprisingly well – and clearly shows where the attention-grabbing elements are. And which go down.

The five-second test is not as unrealistic as you might think. Because if you look at Analytics, you notice that a great many pages on each website are only visited for two or three seconds, then users have clicked on again. A website or the screen of an application must make it clear what it offers very quickly. Otherwise, the users will be gone again in a moment.

Perform usability tests in practice what the best way to proceed if you want to set up usability tests is? There are many best practices, but there is no ready-made template that you can copy and use for each test. The test objects, the questions and the things that are available to you are too different.

It’s best to start with a good plan:

Basic usability test considerations

Create a study concept/research plan

In the study concept or research plan, you record the following points:

Project (what is it about in two or three sentences)
Project participants / responsible persons (especially: who decides who implements)
Target group (who should deal with the application)
Test object (which intermediate versions can we test when)
Question/study goal (what do we want to find out at all)
Questions / working hypotheses (which assumptions / disagreements / uncertainties are there)

As with any document, don’t invest too much time in beautiful documents. The study concept should only provide as little clarity as possible, nothing more. Tips for communicating in a team: Communicating insights

Carry out a usability test remotely or on-site?

It is also essential that you clarify as early as possible how exactly your usability tests should run. The classic is the test in the studio, in the so-called use lab or usability laboratory. You have the proper equipment, the entire infrastructure is ready, and there are hardly any nasty surprises.

But not everyone has their usability laboratory. There is also an office where you are undisturbed during the test. But then you have to plan enough time to set everything up and check it before you start.

The remote test is an alternative. Your test subjects sit at their personal computer and you at yours. You communicate over the Internet via Skype, Hangout or another video chat tool that allows you to transfer the screen. You give instructions and observe – just like in a standard laboratory test.

Such tests are also called moderated remote tests because you and your test subjects sit together at the same time. The alternative is asynchronous tests that run unmoderated. The test persons receive a guide that they work through independently. The software records your interaction with the test object, and you use a questionnaire to record your comments and ratings.

This saves you the work of moderation. But you shouldn’t underestimate how much work it is to prepare such a test well. And it also takes a lot of time to evaluate the records and questionnaires. In my experience, unmoderated tests are only really faster if you have a large number of test subjects – a time saving is noticeable from around 20 test subjects.

And: A lot of information is generally lost during the remote test. Even with the best technology: You get more of how a person reacts, what he thinks about the test object, what problems he has when you sit next to him. I always prefer a test on-site or in the slab.

Formative, summative or validating usability evaluation?

Now it’s getting back to science for a moment. If you read literature on usability tests, you will often read about usability evaluation there. That means nothing else than checking the usability. And sooner or later you will stumble across the three terms formative, summative and validation. What do they mean and do you need them?

Not necessarily in practice, but so that you know what my colleagues are talking about, I will explain them very briefly:

exploratory (formative)

I do an exploratory or formative test early in the project. It’s about exploring something. For example, you want to know whether a classic navigation bar works better for your project or a mega dropdown. Or how people generally deal with configurators. In such a case, you usually test prototypes – or competing applications.

Such an exploratory test explores ways of giving the product a form – hence it is called a formative test.

assessing (summative)

An assessment test or summative test is much closer to the finished product than a formative test. Here you test prototypes or mostly even parts that are as good as finished. The test subjects can work on their tasks, and you as the moderator mostly only have to guide and observe.

Most of the tests we do in everyday life are summative tests.

checking (validation)

Finally, the validation test or verification test runs towards the end of the project. You check whether the problems that you previously observed in tests have been resolved. And above all, whether the product works as it should.

You can work well with metrics in such tests. They give you an objective assessment of how you currently stand compared to previous versions of the competition.

When there is no money: Hallway and guerrilla tests

The counterparts to the very formal, scientifically correct usability evaluation are the hallway test and the guerrilla test. Hallway means gang, and this is precisely where you will find your test subjects for the hallway test. So you grab a colleague who has as little as possible to do with your current project and who does not have any specialist knowledge from the UX area or the content of your project. You then carry out a quick test with this subject – you shouldn’t need more than ten minutes per session.

You do the same thing with the guerrilla test. But you won’t find your test subjects in the office corridor, but in a café, on the street or in the park. Such methods are quick and inexpensive but have their problems due to the lack of target group-oriented selection of subjects and the difficult-to-control environment. But better a test with less suitable test subjects than no test at all.

Allocation of roles – who runs the tests?

A vital role that many underestimate is that of the moderator, i.e. the test leader.

As a moderator, you guide the test subjects through the test. It is your job to make them feel comfortable that they behave as naturally as possible. That they perform the tasks and they know what is expected of them in the test.

The great art is to control the test subjects gently but to influence them as little as possible. That sounds easy, but it takes a lot of experience to be good at it. Whole books have been written on the subject – this is highly recommended: The Moderator’s Survival Guide.

In addition to moderation, the second important task during the usability tests is that of observation. In many projects, the observer and recorder is a second person. This is good because, as the moderator, you can then concentrate fully on the subject. A small disadvantage is that the test persons then often feel a little more insecure at the beginning – two people sit next to them who are watching them closely.

If you are a moderator and observer in one person, then I strongly advise you to record the video sessions. Because it happens with every test that you are currently writing something down and the test person clicks on something that you overlook. If you have the video recording, you can always look up what exactly was going on later.

Stakeholders – Who else is there for the tests?

In addition to the moderator and any observer, other people can also be present during the usability test. These are the stakeholders. So the colleagues who work on the product and the decision-makers. The more of them there, the better.

Because you can talk your mouth fluffy and report as vividly as possible about the tests – nothing convinces so much that the problems you have observed are real and vital as seeing them with your own eyes. It’s also good because you can then discuss, evaluate and sort the observations after the tests. At the end, you can then do a workshop in which you think about how you can fix the individual problems.

So that all the observers do not unsettle the test subjects and so that they can also talk during the tests, the observers have to sit in a different room. A slab often has large, semi-transparent mirrors. The observers in the adjoining room can see the test persons through it, but the test persons only see a large mirror. Alternatively, the image of the test object and the image of the subject and moderator are transmitted to the observation room via video.

What is out of the question are observers in the test room. Only the test person, the moderator and possibly the observing person who took notes to sit in the test room. Stakeholders have no place here. The presence of several people alone is annoying. The test person can also already imagine that such people are dealing with the product and is inhibited. And finally, it is hardly possible for those involved in the project to be quiet during the user tests and to avoid snorting, laughing and, above all, asking questions.

When should I test?

The answer to the question of when you should test comes reflexively from some colleagues: Test early, test often!

Because in general, it is the case that your test results can have a stronger influence on the finished product if they are available early in the project. And very important: the earlier you discover a problem, the less effort it will take to fix it. If you don’t notice it until the application has been entirely conceived, designed and programmed, you have to go back many steps to fix it.

Which methods are suitable for which phase?

As an example, the tests that I ideally carry out when relaunching a large website or a new app:

Testing the existing website This way, we not only find the problems that users have with the site. We also see what is working well and what areas and functions we should keep.
Text from selected competing sites. It’s not about copying. You want to know what users expect, what they like and what they don’t.
Card sorting for the information architecture of the new navigation Cards with terms is sorted into groups/structures. For example, general navigation can be developed.
Tree Testing of the New Navigation In-tree testing I let users search for things just by clicking through the navigation. This is how I can see whether you understand the terms and how they are sorted. (You can find an excellent explanation of the procedure here: Tree Testing for Websites )
Paper prototype test of the essential pages You present test subjects with page sketches (scribbles, wireframes or even printouts of finished designs). You ask them what they would click and what each would expect. (A few more information about the method: paper prototyping )
Click Dummy Test of Early Designs It is best to test with early programmed versions. Often the designs of the screens are finished well in advance; then you can use them to create click dummies and test them with them.
Standard usability test with the first programmed versions Now comes the test that most people do: The one with the first version that the programmers have finished. This is close to the end product, of course, but late in the process.
Test with the beta version Just before the website goes live or the app is delivered, you can do a final test to see whether everything fits. Whether corrections have not resulted in usability problems. You better realize something is wrong now than after it was published.

Of course, you cannot implement this maximum program in every project. But it shows you where you can test meaningfully. When in doubt, always opt for earlier tests.

And it is better to save on the level of detail – if your budget is tight, it is better to test with fewer test subjects per test round and plan enough test rounds.

Subject recruitment how many test subjects do I need for a usability test?

When asked “How many test subjects do you need?” I almost always hear only one of two answers:

“It depends” and
“Five”.

You have probably heard the magic number 5 in connection with usability tests. It was brought into the world by the best-known web usability expert Jakob Nielsen (see Why You Only Need to Test with 5 Users ). Sneezing is quoted as follows: You only need five users to find 85 per cent of usability problems.

But unfortunately, this is oversimplified, so the first answer is correct: It depends. Because there is one prerequisite for finding 85 per cent of the problems with five test subjects: The probability that a user stumbles over these problems is at least a third. What does this mean?

This means that a problem has to be fairly common for you to find it with only five subjects. A third of all users must have this problem – that’s a lot. In a typical usability test, there are maybe 20 problems that I find. But there are only two to four, which affect a third of all users or more. If I test again with five users, then I will probably see these two to four common problems again and again around ten more that I had not seen before. The more I repeat this, the less new I find out.

You see: With every user test, you will also find problems that rarely occur. On the other hand, serious problems that do not occur that often can be neglected. And even with the common problems, the chances of finding them are only about 85%.

And finally, these numbers only apply if the users use the respective function in your test at all. That might sound trivial at first, but you can never actually test the entire application in a usability test. Hardly any website is so small; hardly any app has so few functions, hardly any product has so few possible uses that you can examine the entire content and range of functions in a single usability test.

Conclusion: Five subjects per test is a good starting point. But if you want to test a lot of functions or you have very different target groups, then you need more test persons. And it would help if you planned several test rounds in the course of the project.

Selection of subjects

Ideal test subjects come from the target group. So if you’re testing an application for seniors, you need seniors in the usability lab. If you are testing a website for children, you are inviting children.

Unsuitable test subjects are all those involved in the project and also all employees of the company whose site or app you are testing, as well as friends and family. You are biased and know too much about the background. The idea behind the usability test is to see how real users handle the application.

If your target group is made up of several very different subgroups, then try to take this into account when selecting the test subjects. In my tests, I make sure that there is a balanced mix of genders, age groups and occupations.

You don’t necessarily need several users from each group per test round. But if you do several test rounds, for example, then you can ensure a right mix over the rounds.

Preparation for the usability test preparation of the test sessions

The moderator and the observers are set; the test date is set; the test subjects are recruited. Next, you need:

Declaration of consent A short (!) Confirmation to sign that the test persons agree that they take part in the test and that they are observed and, if necessary, recorded on video.
Confidentiality Statement (NDA) If you can leave this out, all the better. In some projects, however, the test persons must undertake not to post all information online about the top-secret new prototype. Keep the explanation short and in understandable German – not in legal language. Otherwise, you will waste a lot of time debating and create a bad mood at the beginning.
Incentive payments The term incentive has become established for the reward or payment. Test subjects receive an allowance, usually € 30 to € 100. However, if you are testing a shopping website, for example, then it is wiser to give the test subjects a budget. You can then use it to shop on the site. This makes them more fun and more realistic.

The test concept/test script

Merely testing out is not a good idea. This is sure to produce something, but the time with the users is too valuable to leave the results to chance. It is essential that:

The test situation is as natural as possible.
The users behave as they would outside of the test.
The results are representative of real usage.
Above all, you see what users are doing.

The last point, in particular, is essential: Inexperienced testers get a lot of opinions from the test subjects. They comment on how they like the pictures, what colours they don’t like or what happened to them recently on another website. As a good moderator, however, you control the users in such a way that they work through the tasks that you set them. You will find usability problems, but not in a casual conversation with the user.

The most essential tool for controlling the sessions is the test concept, also known as the test script or test guide. The test concept says:

What do you want to know?
What test hypotheses do you have?
Which and how many participants are there?
How long are the sessions?
Which tasks should the test subjects solve?
What questions do you ask the test subjects?

The important thing is: Always start with a simple task. The subjects first have to get used to the situation. If you send them straight away into a complex area that many people have problems with, their motivation will drop. The danger is that the test subjects look to themselves to blame and lose the confidence to be able to solve the further tasks.

How to best involve stakeholders/observers

If others observe the test, then that’s a good thing because they can then see with their own eyes what problems users have with their product.

But this has another significant advantage for you: You have a whole group of helpers. And when else do you have the opportunity to let board members, product managers and other bosses work for you?

Before the tests, you explain to the observers how they will work. And you hand out observation sheets on which the observers should write down everything that they notice. I find it even better to distribute a large pile of sticky notes or index cards. For each, there is an observation.

After every single usability test, you collect the sticky notes and attach them to pinboards, for example. You can arrange them in thematic groups and throw away duplicate pieces of paper.

It is essential that the observers do not interfere with the test. If they are only sitting apart from the test subject by a mirror, you have to make it clear to them that they should not argue out loud, because the test persons could hear this and be irritated. Above all, they mustn’t do one thing: laugh out loud. Because that is extremely uncomfortable for a test person – and also for you as a moderator, you don’t know exactly what’s going on next door.

Sometimes there is a desire for stakeholders to be able to ask the test subjects questions. But that should never happen during the test itself, at most afterwards. And I think it’s better when the test subjects don’t meet the observers – that always feels strange to them. Therefore, one can see that the observers send their questions to the moderator, for example, by e-mail, and he then asks the test person at the end of the session.

Evaluation of the usability test how do you best evaluate the results?

There is an essential tip for the evaluation: It is best to do it immediately. Take a look at your notes after every single session and, for example, correct anything that is not easy to read. Even with a peak memory, the individual test subjects and their own problems become blurred in the memory after five or six sessions a day.

And at the end of the day, it’s best to do at least a rough sift through all of your notes. The evaluation begins the next day at the latest.

Suppose you have taken my previous tip to heart and had as many stakeholders as possible as observers. In that case, it’s best to do your evaluation workshop together after the last test. Yes, such test days are long and exhausting. But extremely productive. And practically all participants come home exhausted, but very satisfied from such a day.

Documentation of the insights

Even if you did an evaluation workshop in an exemplary manner – you usually could not do without documentation. You have to pass on the knowledge you have gained to everyone who was not there. Also, you should be able to read it afterwards, and nothing should be forgotten. You can find detailed tips here: Communicating insights.

Two sorts of problems found are common:

According to the order in which they appear during use
By severity (from severe to mild problems)

You take the first sort if the report is mainly for those who continue to work with it intensively.

The second sort, from severe problems to easy ones, is better if you think many people don’t read the report to the end. And if you want to make sure that the most severe problems are fixed first, rather than the easiest to fix.

Classification according to the seven dialogue principles

To give the evaluation structure and to make the reasons for why your observations are a problem, you can classify them according to the seven dialogue principles.

These seven principles are also included as criteria in the DIN EN ISO 9241 standard, for example. These are the requirements for the interface between user and system (dialogue design). This should have the following properties:

fault-tolerant
appropriate to the task
controllable
self-describing
customizable
conducive to learning
as expected

You can find details on the standard in Wikipedia.

Finally, the following two points in the usability test report (“report tape”) are critical to me personally:

First: Give a solution to every problem. If you are used to scientific work, you may know it differently. But in practice, it is worth gold for everyone who is supposed to continue working with your report if they immediately get an idea of how to get the problems found under control. And you also show that you can not only talk smartly but also have concrete ideas on how to do something better.

Second, don’t just point out the mistakes. Notice the underlying psychology of the readers of the report. This reads quickly like an error list. And the readers of the report – graphic designers, programmers, product owners – are responsible for these errors. Depending on your character, it’s not that easy to put up with. It is, therefore, a good idea not to write down what problems there were. Instead, it would help if you also wrote down what worked well. This makes the report much more pleasant to read. And also not to be neglected: You should also point out the areas that should not be changed during the corrections because they work well for the users. Otherwise, there is a risk that new usability problems will arise again when the usability problems are corrected.

Conclusion – testing usability with high profitability tests are great. They bring practical, highly relevant results, are comparatively quick and easy to perform and are fun every time.

If everything is not going perfectly in your company, think nothing. Practically all companies I know have room for improvement in some areas. Little by little, you can work on getting better.

The critical big step is to start testing. From there, it is only uphill. Because you know: every test is better than no test at all.

And over time, your tests will get more professional. You get better results, and testing becomes more relaxed – usability tests are pretty exhausting every time, even for moderators and observers with a lot of experience.

In-house usability tests: the complete guide

Basics for usability testsThe story of usability – from punch cards to mainframes and PCs to smart wear and the Internet of Things

Usability definition – what is usability anyway?