Open data refers to free, unrestricted, sharing of data collected for a study with everyone/anyone. For example, a researcher might post the study collected in their study on something like Figshare or on the Open Science Framework, and that data could be accessed and re-used/re-analysed by anyone, without the researcher gate-keeping access.

There are several good reasons to openly share data in this way. First, it allows others to verify your work. That is, it allows others to reproduce your data analysis, and check that they find the same effects that you reported. This would allow the research community to identify any errors that have led to incorrect conclusions. Alternatively, open sharing of data allows other researchers to run similar analyses (e.g., the same analyses, but adjusting for different confounders. Or the same analyses, but excluding some potentially influential outliers). This would allow the research community to judge how robust the originally reported effects were (i.e., an effect might not be considered robust if it disappeared when the analysis was repeated with outliers excluded).

Second, open sharing of data reduces research waste. That is, imagine you are conducting a meta-analysis that involves estimating the association between IQ and frequency of hallucinations. Many journal articles may have reported that both of these variables were measured in their study (e.g., so that groups could be matched on IQ scores), but do not report the strength of the association between IQ and frequency of hallucinations. If the authors of the article have not provided open data, then you will need to email the authors to try and access the data. Very often, these requests are not successful (for a variety of reasons – authors may no longer be contactable, data may have been lost/destroyed, authors may simply not respond), or accessing the data takes a very long time. This ‘waste’ (in terms of time spent trying to track down the data) could have been avoided if the study data had been openly shared when the article was published.

Third, open sharing of data is a pretty intimidating thing to do. This is because it makes it much more likely that someone will spot an error in your work (it is still unlikely. But sharing the data makes it possible). As a result, I think (I have no evidence for this, but I am confident it is true – see this – https://blogs.lse.ac.uk/impactofsocialsciences/2014/05/29/data-sharing-exciting-but-scary/ – post by Dorothy Bishop as an example) this makes researchers perform their analyses and manage their data much more carefully (e.g., they will double- or triple-check that the statistics they have reported are correct).

Open sharing of data should, therefore, be something we aim to do. If you visit this – https://osf.io/kshp2/ – page, you should find links to webinars on how to share data on the Open Science Framework. And at this – https://www.benedekkurdi.com/files/Morehouse_AmPsychol_2024.pdf – link, you should find a paper that discusses effective ways of deidentifying data (which is obviously very important for ethical reasons).

While we are on ethics, there are clear ethical issues re: open data sharing and the safety and privacy of participants outweigh the benefits of open sharing of data that I outlined above. This is obviously going to be an issue quite often in hallucinations research when participants may provide a great deal of sensitive data (e.g., about traumatic experiences). And so if you feel that you will not be able to share participants’ data safely, then do not do so.

Related to this issue, there are some practical/logistical reasons that may prevent open data sharing.  For example, in 2024 our local Health Service Research and Development office wouldn’t allow us to submit an ethics application to the Health Services Research Ethics Committee because we wanted to openly share the data we collected. We didn’t feel in a position to argue with the Research and Development office, and so data from that study isn’t openly available.

So, in sum, there are very good reasons why we should openly share our data. However, there will be instances where ethical and logistical factors prevent us from doing so.