Thursday, August 18, 2016

The first rule of registered replication club...A Reflection on Replicating Strack, Stepper & Martin (1988)

"The first rule of registered replication club is: you do not talk about registered replication club."

Last year our lab signed up to be part of a registered replication report (at Perspectives on Psychological Science) seeking to replicate a classic study from Fritz Strack and colleagues (Strack, Stepper & Martin, 1988).  The original study found that participants who held a pen in their teeth (unconsciously facilitating a smile) gave higher funniness ratings to cartoons than participants who held a pen between their lips (unconsciously inhibiting a smile).  Many variants of this study have been done, but never an exact replication.  Since the original paper is often presented as good evidence for embodied views of cognition, and that the paper itself is cited over 1000 times, it seemed like a good candidate for replication. Since we are an embodied cognition lab, and I discuss Strack et al's original study in my teaching, I thought it would be a good project for myself and some students to be involved in.

Once we had signed up, we then agreed to the core protocol developed by Wagenmakers, Beek, Dijkhoff and Gronau, and also reviewed by Fritz Strack.  As well as the protocol, the main body of the journal article is pretty much written at this point, as well as the key predictions, plans for treatment of data, and the analysis plan. There is also little room for deviation from the main protocol, apart from some labs translating the instructions and materials.

However, we also took the opportunity to include some supplementary tasks following the main experiment, so they could not interfere with the experimental manipulation or the key task, which was rating the funniness of four Far Side cartoons.  Because we had a number of undergraduate students helping with the study, each developed a short, additional task or measure that could be easily related to the pen in teeth/pen in lips manipulation. I won't say more about those tasks here, as we still hope to write up those results.

Dan Simons, the handling editor for RRRs at Perspectives on Psychological Science, kept us completely up to date with any developments or minor tweaks to the protocol. For example, it turned out there was a huge shortage of the preferred pen type, and so a little extra flexibility had to be granted there.  There was also a large amount of secrecy surrounding the project (see opening quote) - we knew other labs were involved, but from this point until very recently we didn't know who they were, or how many there were, and we were reminded repeatedly not to discuss our data or findings with anyone outside our research groups.  Life inside registered replication club was simultaneously frustrating, since we couldn't share our findings for months, and kind of exciting to know we were part of a large "covert" research operation.

The infamous Stabilo 68
For us, running the study was only really possible with some additional funding provided by the APS, which covered our equipment costs (web cams to record participant performance, disinfectant wipes, hair clips, tissues - for drool wiping, and lots and lots of Stabilo 68 black ink pens) and about half of our participant expenses.  For the remaining participants we recruited from our undergraduate participant pool, where students receive course credit. We originally planned to recruit over 200 participants but due to unavoidable delays in starting testing and then the unexpected closure of Lancaster University in December last year thanks to Storm Desmond, we recruited 158 participants.  Recruiting even this smaller sample was still a mammoth task as each participant had to be tested individually. It often took several attempts for participants to correctly understand how to hold the pen correctly in their teeth/lips, with testing sessions lasting about 30-45 mins from participant arrival to departure.

Once all the data was collected, we then had to input the data and code responses. Because of the nature of the tasks, the entire study was pen-and-paper based.  I'm so used to automatically extracting data from Superlab or Qualtrics that this was a surprisingly painful task. Once the data were input, we then had to examine the video recordings to ensure each participant performed the task correctly. They had to rate 4 cartoons, and if they did more than one incorrectly they would be excluded from the analysis. We double-coded each participant's performance, and then a third rater would spot check for accuracy. We were now ready to conduct our analysis.

So what did we find? 

We had received a spreadsheet template for our data and analysis scripts that could be run in R Studio. When we ran our analysis, we found a good chunk of participants were excluded (32) for not performing the pen manipulation correctly, but this did not dramatically alter the outcome of the analysis (Final N = 126). Overall we found that people who held the pen in their teeth (M = 4.54, SD = 1.42) gave higher ratings for the cartoons than those who held the pen in their lips (M = 4.18, SD = 1.73), with a marginal p-value of .066.  As well as traditional t-tests being conducted, the analysis script also output the Bayes Factor, to give an indication of whether the results were more likely to be in favour of the null (i.e., no difference between conditions) or the alternative hypothesis (i.e., that there is a difference in the predicted direction). The Bayes Factor was 0.994, meaning the data should be considered inconclusive.

So our own lab's results didn't bring any great clarity to the picture at all. Even with a decent sample, we were left unsure if this was a real effect or not.

We then submitted our data to Dan Simons (around February 2016), where the full omnibus analysis across all participating labs would be completed. And so the waiting began.  Was our lab's result going to be typical of the other results? Or would we be an embarrassing outlier relative to our anonymous colleagues' efforts? We had to wait several months for the answer.

What did everyone else find?

And so about 4 weeks ago (mid-July) we finally received the almost-complete manuscript with the full results.  The meta-analytic effect size was estimated to be 0.03, almost zero, and considerably smaller than the effect of 0.82 observed by Strack and colleagues.  In only 2 out of 17 of the replication effects did the 95% confidence intervals overlap with the effect size from the original study. The calculated Bayes Factors further support this pattern, with 13 of 17 replications providing positive support for the null, and 12 out of 17 supporting the null in a second analysis with different priors.

How do I feel about the results?

My first reaction was slight surprise, but I didn't feel my world had been turned upside down. It was more that, well, this is the result, now let's get on with doing some more science. In advance of conducting the study, I had expected/predicted that the pattern from the original would replicate, but with smaller effect size, as is often the case with replications (see Schooler, 2011, on the Decline Effect).  Although I work in embodied cognition, I didn't feel dismayed by the results, but rather felt that we had done a good job of fairly testing what many consider a classic finding of embodied cognition research.  I do think there may be moderating factors (e.g., participant age, condition difficulty), and that there may be scope for further research on this effect, but I am content that the true effect is much closer to zero than was previously observed.

The full paper is published today (18th August, 2016), along with a commentary from Fritz Strack. I haven't seen the commentary yet, so it will be interesting to see what he thinks. Along with Wolfgang Stroebe, Fritz Strack (2014) noted that direct replications were difficult because "identical operationalizations of variables in studies conducted at different times and with different subject populations might test different theoretical constructs." From his perspective, it is more important to identify the correct theoretical constructs and underlying mechanisms, rather than making sure everything "looks" identical to the original. I don't know whether this argument will wash as an explanation for the difference between the original and replication findings*.

Having been through this multi-site replication attempt, what have I learned, and, as important, would I do it again? Although I have previously published a pre-registered replication (Lynott et al., 2014), this one was on an even larger scale, and I tip my hat to Dan Simons for coordinating the effort and to EJ Wagenmakers, Laura Dijhkoff, Titia Beek and Quentin Gronau for doing excellent work in developing the protocol, and providing such detailed instructions for the participating labs.  I'm in no doubt, that as a lead lab, there is a huge amount of work involved in preparing for and implementing a Registered Replication Report. Even as a mere participating lab, this was quite a bit of work, but I'm very glad that we contributed, and I hope that in the near future we'll be involved in more RRRs and in pre-registration more generally.  Lastly, it's great to be finally able to talk about it all!

(And if anyone wants to buy some leftover Stabilo 68s, I can do you a good deal.)

*Correction 18/08/2016: Fritz Strack did not review the protocol. Rather he nominated colleague to review.
Edit 19/08/2016 - Link added to in press paper
Edit 12/01/2017 - DOI added to Wagenmakers et al paper

Reference
Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Jr., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R., Capaldi, C. A., Carfagno, N., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., Dijkstra, K., Fischer, A. H., Foroni, F., Hess, U., Holmes, K. J., Jones, J. L. H., Klein, O., Koch, C., Korb, S., Lewinski, P., Liao, J. D., Lund, S., Lupiáñez, J., Lynott, D., Nance, C. N., Oosterwijk, S., Özdoğru, A. A., Pacheco-Unguetti, A. P., Pearson, B., Powis, C., Riding, S., Roberts, T.-A., Rumiati, R. I., Senden, M., Shea-Shumsky, N. B., Sobocko, K., Soto, J. A., Steiner, T. G., Talarico, J. M., van Allen, Z. M., Vandekerckhove, M., Wainwright, B., Wayand, J. F., Zeelenberg, R., Zetzer, E., Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, & Stepper (1988). Perspectives on Psychological Science. DOI: https://doi.org/10.1177/1745691616674458


Other references
Lynott, D., Corker, K. S., Wortman, J., Connell, L., Donnellan, M. B., Lucas, R. E., & O’Brien, K. (2014). Replication of “Experiencing physical warmth promotes interpersonal warmth” by Williams and Bargh (2008). Social Psychology, 45, 216-222. DOI: 10.1027/1864-9335/a000187

Schooler, J. W. (2011) Unpublished results hide the decline effect. Nature, 470, 437.

Strack, F., Martin, L. L., & Stepper, S. (1988). Inhibiting and facilitating conditions of the human smile: a nonobtrusive test of the facial feedback hypothesis. Journal of personality and social psychology, 54(5), 768.

Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59-71.

Sunday, June 5, 2016

Why staying in the EU might be good for UK research

There is obviously a lot of debate at the moment about the pros and cons of staying in the EU. I'm not going to get into the broader debate, but I thought I'd highlight some data on European Research Council funding successes that suggest why Britain is better off in the EU in this instance.

Here, I take a look just at grants awarded to individuals in the form of European Research Council (ERC) Starter, Consolidator and Advanced grants.  The amount of funding goes from €1.5-€2.5 million, with these grants generally viewed as being very prestigious. In 2015 for example, grants worth ~€398 million were awarded to the UK by the ERC.

There are two reasons why the UK is better off in the EU in relation to funding under these grant schemes.

First of all, the UK does very well in all three grant schemes, and in fact has the largest number of grant successes of any European country.

In the 2015 allocations, the UK received the highest number of starter grants (61, compared to second-placed Germany's 53), the highest number of consolidator grants (67, compared to second-placed Germany's 45), and the highest number of advanced grants, with a whopping 69 awards, compared to second-placed Germany on 43 awards. So, in absolute terms, the UK attracts an awful lot of this funding, which means excellent research and researchers are being funded to do their work in the UK.

Secondly, the majority of grants won by the UK are not actually won by UK citizens, but by non-UK citizens who have come here to work, or who are using the grants to come to the UK to do research.

For starter grants, only 28% (17/61) were awarded to UK nationals - the figure below shows the distribution of grantees by country of host institution. For consolidator grants, 36% (24/67) went to UK nationals, while a majority 65% (45/69) of advanced grants went to UK nationals.  Overall, more than half of all awards (56%) that came to the UK were awarded to non-UK nationals. In monetary terms, approximately €212 million of the €398 million awarded in 2015 was brought to the UK by non-UK citizens.  So, yes, there is excellent research being done in the UK, but a large chunk of it is being done by people who are not originally from the UK.   What's more, the largest proportion of non-UK "grantees" are from elsewhere in the European Union.

ERC Starter grant awards by country (2015)

There's no doubt that the UK does very well out of these funding competitions, and the success rate also speaks to the quality of research and researchers working in the UK.  However, the data also highlight the importance of freedom of movement within the EU for scientists and researchers. Switzerland is outside the European Union, but also does very well out of ERC research grants; the condition for them being able to access these funds (or any other Horizon 2020 funding) is freedom of movement for EU workers. If the UK were to leave the EU and place added immigration controls for EU workers, would the UK remain as attractive a place to work in the future?  I'm not so sure.

Sources
Statistics on awards for starter, consolidator and advanced grants in 2015
https://erc.europa.eu/sites/default/files/document/file/erc_2015_stg_statistics.pdf
https://erc.europa.eu/sites/default/files/document/file/erc_2015_cog_statistics.pdf
https://erc.europa.eu/sites/default/files/document/file/erc_2015_adg_statistics.pdf