open-discussion

open-discussion > RE: AAAS: Your Paper MUST include Data and Code

RE: AAAS: Your Paper MUST include Data and Code

Originally posted by Pierre Bellec:

Well I'd say that a lot of people will agree on the principles of reproducible research and will clearly see the benefits, but I am surprised no one has mentioned yet how challenging this is in practice for neuroimaging. First there are huge issues with anonymization, especially for clinical data. Processing a dataset is one thing, releasing it publicly is another (faces in T1 scans have to be blurred for example). Then you need to host securely tons of datasets online. In the same vein, coding an in-house algorithm is one thing, releasing it publicly is again completely different (you need to document !). For new algorithms, the production environment can be very hard to reproduce. Moreover, the analysis itself can be computationally challenging (I use supercomputers all the time, this is not used by the vast majority of the neuroimaging community). Not to mention that a lot of research group do not fully automatize their data processing flow. My point is that it is not enough to say "let's go reproducible/public/open source/....". We also need an infrastructure to do so. I bet that in the next couple of years we will have websites that allow to share datasets publicly, and request with only a couple clicks for processing in supercomputers with well tested and maintained analysis pipeline. There are many current efforts in that direction (e.g. http://www.cbrain.mcgill.ca/). But at this stage this is science fiction as far as I know.

On Anonymization:
It is good practice to make sure all fMRI data is anonymized in the first place. The issue with T1 scans (and indeed any MR datasets) can be handled by simply requiring the receiving party to get approval from the relevent ethics committee. Cannot see how Science will say no to this requirement.

On algorithm:
As Matthew points out, releasing algorithm actually have the extremely big beneift of getting the algorithm tested. The feed back from the process actually improve the algorithm. If you find it hard to reproduce results for new algorithms on new production platform, experience says you have bugs. Surely you have a unit test suite that can tell you where your bug is? ;-)

I do coding for a living. Received and examined so many program code from neuroscience community that I can tell you it is not really a big problem. I see bad code and good code alike. Even for the worst code, I manage to decipher it. Documentation? Never trusted it. Even on sourceforge.net I bet you will find program code written worse then the worst you find in the community, especially considering the programmer might actualy comes from commercial software field.

On servers hosting datasets:
There is no need for a centralized server. A decentralized approach is probably better. In fact, data need not be online at all. fMRI Data Center did quite a good job in posting data through the postal service.

Institutional libraries are also making available digital space for archiving.

When there is a will, there is a way. Let's not get too bogged down by servers.

On workflow:
It is ony asked that you make your workflow available for inspection. If you use supercomputing, then make the supercomputing script available. If you do partially automated, give us your scripts and documentation for the other part of the workflow in a way that allow us to reproduce it.

For the purpose of reproducing the results, we have to assume the receiving party has the capability to either reproduce your environment or can adapt your scripts to their environment and at all time keep fidelity to your workflow. IMHO, any workflow that cannot survive a change in environment had to be treated suspiciously.

On infrastructure:
It will be sometime before we can have facilities like genome research where you can send your search request to servers in Tokyo, Washington DC or other places which process your request for you and return the data. Even then, to tell the truth, I don't think I want something like this.

Threaded View

Title	Author	Date
AAAS: Your Paper MUST include Data and Code	Luis Ibanez	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	hongtu zhu	Mar 13, 2011

RE: AAAS: Your Paper MUST include Data and Code	Luis Ibanez	Mar 13, 2011

RE: AAAS: Your Paper MUST include Data and Code	Matthew Brett	Mar 13, 2011
RE: AAAS: Your Paper MUST include Data and Code	Isaiah Norton	Mar 13, 2011
RE: AAAS: Your Paper MUST include Data and Code	Torsten Rohlfing	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Luis Ibanez	Mar 11, 2011
RE: AAAS: Your Paper MUST include Data and Code	Daniel Kimberg	Mar 10, 2011
RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Torsten Rohlfing	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Torsten Rohlfing	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Torsten Rohlfing	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Matthew Brett	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Pierre Bellec	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Luis Ibanez	Mar 11, 2011
RE: AAAS: Your Paper MUST include Data and Code	Matthew Brett	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011
RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011
RE: AAAS: Your Paper MUST include Data and Code	Torsten Rohlfing	Mar 10, 2011
RE: AAAS: Your Paper MUST include Data and Code	Daniel Kimberg	Mar 10, 2011

RE: AAAS: Your Paper MUST include Data and Code	Cinly Ooi	Mar 10, 2011