Keeping Yourself out of the Story: Controlling Experimenter Effects

We take a look at some subtle yet pervasive experimenter effects, at ways they can bias the outcome of a design experiment, and at what we can do to control their influence.


Nothing is more guaranteed to shatter the illusion of reality that‘s building up nicely in that best seller you‘re reading, than the author suddenly appearing out of the blue and talking to you directly. Grandly known as ‘authorial intrusion’, it‘s the moment the author yanks the reader‘s attention out of the story, reminds us of the real world, and breaks the golden rule of fiction writing and journalism: keep yourself out of the story. It’s right up there with the film crew recording its own reflection in a passing shop window. 

Research has its own version of these unfortunate moments. They happen when the researcher blunders into view and ruins things by influencing the outcome of a study. We call these blunders Experimenter Effects.

Experimenter effects contaminate the research process, but we know their origin. They are almost always the result of the experimenter having prior expectations about the hypothesis of a study. 

Some classic examples remind us how insidious experimenter effects can be. Recall the case of Clever Hans the counting horse (not to be confused with Mr. Ed, the talking horse who, as actual TV evidence shows, really could talk). When given numbers to add, subtract, multiply or divide, Hans would tap out the correct answer with his hoof. Psychologist Oskar Pfungst showed that, rather than doing arithmetic, Hans was simply picking up involuntary ideomotor cues from his trainer when the correct number of hoof taps was reached. When asked a math problem his trainer did not know the answer to, Hans‘ performance collapsed. 

And it’s not just horses. In 1963, psychologists Lucian Cordaro and James Ison asked two groups of college students to observe and count head turns and body contractions made by planaria (flatworms). One group of students was led to believe the target behaviours would occur infrequently, while the other group was led to expect a high rate of head turns and contractions. The flatworms in each group were identical and (as is often the case with flatworms) had no expectations about anything. Sure enough, the group expecting a high rate of body movements reported a much higher count than the group expecting a lower rate-an outcome carried entirely by experimenter expectation.

Robert Rosenthal‘s extensive work on experimenter effects (an investigation spanning 30 years and including both animal and human behavioral research) reveals that 70% of experimenter effects influence outcomes in favor of the researcher‘s hypothesis. But-good news-science has a powerful antidote to experimenter effects: the Double Blind.

The Double Blind

In a double blind study, neither the participant nor, critically, the experimenter (nor anyone involved in moderating, observing, recording or analyzing the data) knows the research hypothesis, or knows which condition participants are assigned to, or which design is the ‘before’ and which is the ‘after’, which is ‘ours’ and which is ‘theirs’ etc. A double blind eliminates the most problematic experimenter effects and is effectively used in clinical trials and in debunking pseudoscientific claims

Experimenter effects are also common in UX and customer research, though I think I’ll switch to calling them ‘researcher effects’ as most UX methods are not really experiments in the conventional sense (that was an authorial intrusion by the way). Field studies, focus groups, interviews and usability tests are all susceptible to researcher effects because researchers have expectations, and because UX and marketing research create social situations, and because being truly objective is difficult. Unfortunately, conducting a double blind in UX research, while not impossible, is very challenging in practice and is seldom seen. A UX researcher typically works with the design team on a daily basis and has been instrumental in guiding design. In any evaluative situation, there’s no way the UX researcher can suddenly have no knowledge or prior expectation of study conditions. Bringing in an external UX researcher doesn‘t solve the problem either, because this person also must know the design or product very well in order to conduct an effective study. And remember, the double blind must extend to data recording, and to analyzing, interpreting and reporting the results. In most cases, running a double blind would turn a potentially easy and quick UX study into major covert operation.

The double blind may be the gold standard but, if we can’t employ it, how else might we keep the researcher out of the story? The first step is to raise awareness among project teams that researcher effects exist, and that they can have serious consequences. The second step, since we can’t totally eliminate these effects, is to find ways to control them. 

Here are some ways that researchers influence the outcomes of their own research, and some thoughts on how we might control biases in these situations.

Biases when interacting with participants

These are the ‘Clever Hans’ biases that stem from unintended communications with the participant before and during a study. They result from verbal and non-verbal cues and gestures that influence the participant’s thinking or behaviour during the experiment, and become damaging when they systematically ‘lean’ towards one particular outcome. 

I observed a study recently during which the moderator did a great job of remaining neutral when introducing competitor designs, but leaned forward and nodded whenever she spoke about the sponsor’s design. In reality there are an almost infinite number of biasing behaviours that can creep into a study, from the blatantly obvious, “We hope you’ll only have good things to say about our product.” (I’m not making this one up), to the almost imperceptible smile when the participant clicks the right button. Other influencing behaviours can include the researcher’s mood and demeanor, tone of voice, frowning, sighing, tensing up, relaxing, shuffling on their chair, raising their eyebrows, and pulling faces. Even note-taking can introduce a bias (oh dear, he just wrote something in his notebook, I must have done it wrong). And this is before we even start to think about the biasing effects of leading and loaded questions, reassurances such as, “You’re not the only one to have done that wrong today”, or paraphrasing the participant’s remarks with a little extra topspin, or allowing one’s own opinions to creep into the dialogue. 

We can‘t eliminate all of these biasing factors: they are far too pervasive, and we would end up behaving like automatons if we even tried to monitor ourselves down to this micro-behavioural level. So what is the solution? Assuming a double-blind test is not possible, here are some techniques that can help:

  • Standardize everything: the research protocol, the moderator script, the questions etc. Follow the same protocol in the same way for every participant in every condition. Present task scenarios on cards for the participant to read aloud. Stick to the script.
  • Have a second researcher monitor the first researcher. Monitor for ‘protocol drift’. 
  • Stay out of the participant’s line of sight. In summative usability testing, leave the room if you can during a task. 
  • Practice. Run mock studies that focus on controlling biasing behaviours. 
  • Video-record yourself administering a study or moderating an interview. Critically analyze your performance, and have a colleague help you note any systematic biasing behaviours and inadvertent signals you may be giving.

Biases when recording, interpreting and reporting findings

Researcher effects can contaminate a study in a number of ways, such as:

  • Systematic data recording errors
  • Participant actions or responses that are given undue weight
  • Jotting down what you thought the participant meant rather than what she actually said
  • Trying to interpret data on the fly and too early into a test
  • Making careless recording errors
  • Failing to pay attention
  • Struggling to keep up with the participant
  • Making copying and data-entry errors. 

Subsequent data interpretation can also fall foul of the confirmation bias. This is the tendency to prioritize evidence that fits well with what we already believe, while ignoring evidence that doesn’t. Even highly experienced researchers can fall foul of this cognitive trap. 

Biases towards a positive outcome can also influence report writing and research presentations. In the scientific community similar pressures to succeed exist, such that negative results are less likely to be submitted for publication and, if submitted, are less likely to be accepted than positive results. In consumer research, we’ve seen obviously negative findings given a ludicrously positive spin in a final research presentation with summary headlines such as: “5 of the 20 people liked the new concept”. Researchers doing this risk their credibility. Business stakes are far too high to risk being misled by a researcher contriving a ‘feel good’ effect. 

This may sound unusual, but you should not care what the outcome of your research is. You should only care that your research design and data are bulletproof and will stand up to the most rigorous scrutiny. Let the chips fall where they may. It’s not your job to guarantee a happy ending. 

Here are some checks we’ve found helpful:

  • Decide on the data logging procedure ahead of the study. Use a manageable number of data codes to categorize events. Practice using them and document them in a Test Plan. 
  • Record objective data where possible: for example, task completion rates and time on task. 
  • Agree any pass/fail criteria ahead of the study, not afterwards.
  • It may not be possible to have a “blind” experimenter, but it may be possible to have “blind” data loggers. Have at least two data loggers (or note-takers) so that you can check for inter-scorer reliability and compare notes. 
  • Record what the participants actually say, not what you think they mean.
  • Avoid trying to interpret the data during the study.
  • Double-check your data coding, data entry and any statistical analysis.
  • Ask a research colleague to read your final report, or presentation slides, and give critical feedback.

Sponsorship biases

Biases towards the sponsoring company or funding source are common. In the pharmaceutical industry, for example, companies facing the conflict of interest between doing good science, on the one hand, and increasing product sales, on the other, will often report only positive findings. Company-funded trials are four times more likely to find evidence in favour of a trial drug than similar studies funded by other sponsors.

In another example of sponsorship bias, MIT Professor, Justine Cassell, related her experience working with a particular company as a neutral observer. Following a series of focus groups the company’s research team concluded that what teenage girls wanted more than anything else was technologically enhanced nail polish. This was an astonishing stroke of luck for the company because technologically enhanced nail polish was exactly what they produced! However, as Cassell pointed out, in her own independent research with over 3,000 children (60% of whom were girls) in 139 countries, in which the children were asked what they would like to use technology for, not a single one of them said nail polish!

We recently witnessed a deliberate example of sponsor bias, motivated, I’m sure, by a misguided desire for a happy outcome, when we were invited to review a project in which a third party ‘independent’ research company was carrying out long-term in-home trials of a new product. We were nothing short of gobsmacked to discover that the owner of the research company had planted herself into the study as a test participant, and that for every item on every questionnaire over a 3-month period, she gave the product the highest possible positive rating. In her eagerness to please the client, this researcher had clearly lost sight of the purpose of the in-home research, which was to uncover problems so that they could be fixed before the product was launched. 

Internal company pressures, often stemming from the ever-escalating collective belief that the project cannot possibly fail (otherwise why are we still doing it?) can create a working environment intolerant of any negative outcomes. Much of this pressure comes from testing or carrying out research too late, and it can be defused by doing research and testing early and often, and by keeping the stakeholders closely involved so that they can make course corrections and mitigate risk before things reach the point of no return. 

Again, if a double blind can be employed, this can effectively put a ‘firewall’ between the funding source and the research team. No one involved in conducting and reporting the study would know who the sponsoring company was. But, as we have noted, in UX this is often not possible, and sometimes you just have to stand by your research data, bite the bullet and break the bad news, sponsor or no sponsor. However, you might be surprised at the reaction you get. I once had to present negative UX research data that I knew would deliver a death blow to a $28 million project. To my surprise, the relief in the room was palpable. It was the final piece of evidence that gave the company the confidence to pull the plug and stop wasting more money.

Why you need to fix your Bias blind spot

Bias works like a Trojan horse. It hides in our UX toolbox, slips past our defences, and operates from the inside. Everyone has a bias blind spot: we are less likely to detect bias in ourselves than in others. Carey Morewedge, associate professor of marketing at Boston University says: 

“People seem to have no idea how biased they are. Whether a good decision-maker or a bad one, everyone thinks that they are less biased than their peers. This susceptibility to the bias blind spot appears to be pervasive, and is unrelated to people’s intelligence, self-esteem, and actual ability to make unbiased judgments and decisions.”

Unlike in the academic science community, where research is rigorously peer reviewed and scoured for methodological flaws, even shot down if it cannot pass muster, most UX and customer research is seldom subjected to such scrutiny. Things are simply moving too fast. Headline findings are often all that most stakeholders see, and findings are largely taken on trust that the results are what they appear to be. Decisions are quickly made and the show moves on down the road. 

But false research outcomes can cost a company millions of dollars. For the researcher, keeping out of the story is not easy, but biasing effects can be minimized, and our strongest weapon against researcher bias may simply be the awareness that it exists.

 

Philip Hodgson, March 2016