By: Arnaud Delorme, Ph.D. – shared his first study containing raw EEG datasets from 16 participants in 2002 on his personal website (https://doi.org/10.18116/fseb-qg45)
Increasingly, journals are incentivizing authors to share their
data – and in some cases, it is becoming required for publication.
Given this, it is important to understand and evaluate the pros and
cons of sharing your data.
The pros are:
- Increased citations. Some researchers
might publish using your data and cite you. Even though researchers
might not publish using your data, they might run rudimentary
analysis to compare with their results and cite you. On average,
there is a 25% citation boost for linking
your data to your publications.
- Create new collaborations. Some
researchers might contact you to assist them in reanalyzing your
data for a specific question that might not have been relevant to
you at the time of publication – or they might want to submit a
grant application with you to do so.
- You might get another publication out of publishing
your data. For example, Scientific Data, a Nature Group journal focuses
on publication of data only, where you describe your data, how it
is publically released, and provide a detailed description and
basic quality metrics.
- It will be easier for you or your collaborators to
reuse that data. Admittedly, the person most likely
to reanalyze that data in the future is probably you. If you ever
have to reanalyze that data again, having it properly formatted and
readily available online could save you dozens of hours of headache
juggling through old hard drives, CD (even archival tapes),
re-contacting the student who acquired it and now moved to another
job, etc…
- You might increase your chance of getting
funding. This is especially true for agencies such as
NIH (National Institute of Health in the USA) that promote such
practices. Showing a track record of releasing your data will show
that you mean business!
- You are benefiting science. There is
probably a reason why you became an academic scientist that goes
beyond personal gain. Perhaps, it was to inspire new generations or
advance human knowledge. There is little doubt that sharing your
data is more aligned with these beliefs than not sharing your
data.
The cons are:
- Preparation time. It takes time to format
your data so it can be shared online. Admittedly, some researchers
simply dump their files with no documentation on NITRC, figshare, OSF, or
another repository that does not impose any data formatting.
However, for neuroimaging data, we would advocate the use of
repositories such as OpenNeuro that enforce formats
such as BIDS. There are now many software to format your data as
BIDS in just a few clicks. For example, for EEG, which is my area
of research, there is an EEGLAB plugin that will
automatically convert raw imported data files to BIDS. Formatting your data to BIDS will ensure
that it contains all the documentation necessary for reuse and that
it is compatible with standard processing pipelines.
- Data, especially when you have a lot of it, is
power. We all know researchers that have acquired a
couple of dozen neuroimaging datasets and restrict their access to
a handful of collaborators that might publish with it so they can
be a co-author. Releasing the data publicly would mean loss of
potential publication and collaborations. If you are one of these
researchers and it works for you, then you should probably continue
to do that. However, there are a large number of researchers that
cling to their data just in case such an opportunity might come up.
If you are one of those that retain access to their data just in
case, we would argue that the potential benefits of publicly
releasing your data outweigh the potential loss. First, you are
probably more likely to spur collaborations with your data online.
Second, not formatting and releasing your data after you are
finished with the experiment means that the efforts to do so a
couple of years down the road will become exponentially
harder.
I hope I have convinced you to share your data, if not for the
public good, for your own benefit.
Sharing data is just the first step. The second step is to share
your analysis pipeline and hopefully, I will get to write a future
blog on this topic.
Quarterly Newsletter Article from October 6, 2020