Special Section on Replicability in Psychological Science: A Crisis of Confidence?

Perspectives on Psychological Science November 2012 7
web, Google Scholar

Abstract

Is there currently a crisis of confidence in psychological science reflecting an unprecedented level of doubt among practitioners about the reliability of research findings in the field? It would certainly appear that there is. These doubts emerged and grew as a series of unhappy events unfolded in 2011: the Diederik Stapel fraud case (see Stroebe, Postmes, & Spears, 2012, this issue), the publication in a major social psychology journal of an article purporting to show evidence of extrasensory perception (Bem, 2011) followed by widespread public mockery (see Galak, LeBoeuf, Nelson, & Simmons, in press; Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011), reports by Wicherts and colleagues that psychologists are often unwilling or unable to share their published data for reanalysis (Wicherts, Bakker, & Molenaar, 2011; see also Wicherts, Borsboom, Kats, & Molenaar, 2006), and the publication of an important article in Psychological Science showing how easily researchers can, in the absence of any real effects, nonetheless obtain statistically significant differences through various questionable research practices (QRPs) such as exploring multiple dependent variables or covariates and only reporting these when they yield significant results (Simmons, Nelson, & Simonsohn, 2011).

Comment

I came across this special issue and thought that it is worth sharing here. The abstract above is the first paragraph of the editorial introduction (doi) by Harold Pashler and Eric-Jan Wagenmakers. The special issue even contains a republication of a famous blog post
(The 9 Circles of Scientific Hell)
on Neuroskeptic. The only thing I wonder is why it has taken psychologists until now to become aware of these problems (I realise that some people probably knew this all along, but why was the field as a whole not concerned)? Maybe it has something to do with an increase in quantitative methods through the rise of cognitive neuroscience. In any case, it is not really surprising that it is hard to replicate studies involving the human mind/brain, because it is hard to control for the precise state of mind of a person. Already the person supervising the experiment, or the experiment location may have an effect on how a participant behaves in the experiment (cf. e.g. Milgram experiments in the 1960s). Interestingly, already Richard Feynman used these problems to deny psychology the label “science” and rather called it together with all social sciences “pseudoscience“. Here is an excerpt from his 1974 Caltech Commencement Address titled “Cargo Cult Science: Some Remarks on Science, Pseudoscience, and Learning How to Not Fool Yourself” (in, e.g., Richard P. Feynman, The Pleasure of Finding Things Out, Penguin Books, 1999) which describes the core problems psychologists in the special issue are also concerned with:

Other kinds of errors are more characteristic of poor science. When I was at Cornell, I often talked to the people in the psychology department. One of the students told me she wanted to do an experiment that went something like this–it had been found by others that under certain circumstances, X, rats did something, A. She was curious as to whether, if she changed the circumstances to Y, they would still do A. So her proposal was to do the experiment under circumstances Y and see if they still did A.

I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person–to do it under condition X to see if she could also get result A, and then change to Y and see if A changed. Then she would know that the real difference was the thing she thought she had under control.

She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happens.

Nowadays there’s a certain danger of the same thing happening, even in the famous (?) field of physics. I was shocked to hear of an experiment done at the big accelerator at the National Accelerator Laboratory, where a person used deuterium. In order to compare his heavy hydrogen results to what might happen with light hydrogen” he had to use data from someone else’s experiment on light hydrogen, which was done on different apparatus. When asked why, he said it was because he couldn’t get time on the program (because there’s so little time and it’s such expensive apparatus) to do the experiment with light hydrogen on this apparatus because there wouldn’t be any new result. And so the men in charge of programs at NAL are so anxious for new results, in order to get more money to keep the thing going for public relations purposes, they are destroying–possibly–the value of the experiments themselves, which is the whole purpose of the thing. It is often hard for the experimenters there to complete their work as their scientific integrity demands.

All experiments in psychology are not of this type, however. For example, there have been many experiments running rats through all kinds of mazes, and so on–with little clear result. But in 1937 a man named Young did a very interesting one. He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before.

The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and still the rats could tell.

He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell.

Now, from a scientific standpoint, that is an A-number-one experiment. That is the experiment that makes rat-running experiments sensible, because it uncovers the clues that the rat is really using–not what you think it’s using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat-running.

I looked into the subsequent history of this research. The next experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn’t discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic of cargo cult science.

Abstract

Comment

Leave a Reply Cancel reply