In randomized controlled trials, which – I think – represent the highest quality science we do, a key pat of the methodology is to identify what the study’s primary outcome is prior to recruitment to the trial starting. So, for example, the researchers running a trial of a novel anti-depressant medication might report that participants’ scores on the Beck Depression Inventory (BDI) will be their primary outcome. They will measure a range of other variables (e.g., quality of life, social functioning), but the key result of their trial will be whether participants who receive the novel anti-depressant score lower on the BDI at the end of the trial in comparison to participants who receive the control intervention. If, at the end of the trial, the researchers behave as if something other than BDI score was their primary outcome, this is universally accepted as very poor practice (and probably misconduct). This is because ‘outcome switching’ like this increases the likelihood of false positive findings, as by switching one’s outcome variable of interest, you effectively break the rules of null hypothesis testing.

Until very recently, outside of RCTs, I think almost no researchers publicly recorded what their primary dependent/outcome variable was going to be ahead of them conducting a study (it may have been reported in an ethics application, but these are not typically made publicly available). And this allowed them/us to analyse our data with high levels of flexibility. For example, let’s imagine that we have run a study looking at links between cannabis use and hallucinatory experiences (HEs), with HEs measured by the CAPS. The CAPS consists of 32 items, with each item referring to a different unusual perceptual experience, and participants provide responses in terms of whether or not they have experienced that unusual perceptual experience, and if they respond that they have, they report the frequency of, intrusiveness of, and distress elicited by each unusual perceptual experience. Now let’s imagine that we haven’t identified what our primary DV is prior to collecting the data. As a result, when analysing that data, we can make a very reasonable argument to score the CAPS by (1) totalling the number of unusual perceptual experiences a participant reported, (2) totalling the frequency scores, or (3) totalling the frequency, intrusiveness, and distress scores. And when thinking about which items we should take scores from, we can make a reasonable argument to defend using (1) all 32 items, (2) the ‘clinical psychosis’ subscale reported by Bell et al. (2006), or (3) the ‘temporal lobe disturbance’ subscale reported by Bell et al., or (4) the ‘chemosensation’ subscale reported by Bell et al. (2006), or (5) the ‘multimodal perceptual alterations’ subscale reported by Tamayo-Agudelo et al. (2015). If all of these 15 options (three different ways of scoring each item multiplied by five different ways of picking which items are included as your measure of HEs) for calculating a CAPS score are available to us, and we test whether use of cannabis is associated with each of these 15 different measures of HEs, we move from a situation where we have a 5% chance (assuming that our alpha level is set at p < .05) of generating a false positive finding to a situation where we have more than a 50% chance of generating a false positive finding. That is, we are more likely than not to generate a false positive finding.

This level of ‘analytic flexibility’ likely explains a paradox in psychological research: while most studies in psychology are underpowered (i.e., they do not have sample sizes large enough to detect effect sizes that are typical in psychology research), more than 90% of psychology studies report ‘positive’ findings (i.e., findings that appear to support the predictions made by the study’s authors). And, it means that we can’t really be confident that the evidence base we have been generating is robust.

Pre-registration – where before we collect any data, we explain in detail what our predictions are, how we plan to measure our key variables, how we plan to analyse the data, and how we plan to deal with things like outliers and missing data – aims to reduce this level of analytic flexibility, and so should reduce the likelihood that a study generates false positive findings. It isn’t a straitjacket: the pre-registration is simply a plan – you can make changes to that plan, you just need to explain what changes you made and why when writing up the study’s results. And it doesn’t need to be done for each and every study (e.g., an exploratory study doesn’t need to be pre-registered). However, when you are running a study and want to be confident (and you want to reassure any peer-reviewer) of the robustness of the analyses you have performed, pre-registration should be very helpful.

There are many different ways to pre-register a study, but this – https://www.psychologicalscience.org/observer/research-preregistration-101#.WR3GyFPyvOT – is a useful guide. More recently, this – https://osf.io/preprints/psyarxiv/ha29k – has been published on when and how to deviate from a pre-registration. There is some debate  – see this https://psyarxiv.com/wxn58/download?format=pdf paper  – over how well pre-registration resolves the problems it aims to address, but my opinion is that the benefits of pre-registration outweigh any costs/limitations. Hopefully these resources will be useful if you decide to pre-register your predictions and/or analysis plan for a future study, but if you need any extra help, do feel free to email me – david.smailes@northumbria.ac.uk