open-discussion
open-discussion > RE: AAAS: Your Paper MUST include Data and Code
Mar 13, 2011 06:03 PM | Luis Ibanez
RE: AAAS: Your Paper MUST include Data and Code
Originally posted by hongtu zhu:
I'm very glad that you pointed this out.
Current publishing practices show an extreme disregard for the importance of numerical parameters. Such dismissive attitude is obviously rooted in the fact that reviewers typically do not attempt to replicate the work (because they don't have access to the data, the code, may not have the time, and are not even encouraged by the editors).
Such replication work is left "as an exercise" to graduate students that will read those papers after publication; and for which their supervisors "assume" that the poor graduate student "just" have to write the code for the algorithm described in the paper (A task that in practice requires some divination skills combined with black magic, and months of work). All in all, this illustrates how disconnected the "management" of science has become from the "communities of practice". (or in plain words: heads of labs do not quite know what the actual lab workers do and how they do it). We have too many Aristotles and not enough Galileos (Aristotles didn't needed to run experiments, he could figure out the truth just by reasoning about it...(just like most of today's reviewers), Galileo instead ran experiments and check by himself. No surprise that Aristotle got most of Physics wrong, and even failed to know the number of teeth in women).
For those who really attempt to replicate the work described in a paper, it becomes immediately obvious that those little numerical parameters are ESSENTIAL to the data processing, and unfurtunately they are rarely disclosed in a usable form. This leads authors, reviewers and readers to broad claims of "Methods X" is better than "Method Y", jumping over the fact that "Method X" can display a full range of different behaviours when you start changing the numerical parameters (typically in the order of 10 to 50 parameters). So, in practice there is no such thing as "Method X", there is only "Methods X, with parameters {a,b,c,d...} ran on data M,N...
As maintainers of ITK, we are in the business of working with the ITK community on taking published papers and converting them in usable code (tested, multiplatform, open source). In that process of translating "papers" into "real code", we have seen many skeletons in many closets..., and many emperors without clothes. The lack of full specification of parameters and the absence of the "real recipe for use" is blatant in most papers. Curiously this is particularly done when the operation is deemed to be too simple (despite of being essential). For example, in one case it took us several months to realize that the authors have taken all of their input data and renormalized their intensities before feeding it into the processing pipeline described in the paper. A trivial step, that made a huge difference in the results, but was not mentioned in the paper. Authors (in the effort of presenting their papers as "scientific" tend to overlook the pieces of the work that do not look "complicated" enough or "smart" enough, hence their attachment to anything that looks like a differential equation...)
It is great to see that progressive Journals like PLoS ONE and OCR get it !, and are requiring parameters to be included as part of the recipe for replication.
One approach that we used in the Insight Journal (http://www.insight-journal.org/)(IJ) was to include the scripts that are used to run the code, and in such scripts encode all the necessary parameters. A second sociological trick that helped a lot, is not to call these submissions "Papers" and instead call them "Technical Reports". That nuance help trigger in the mind of authors the notion that this report is intended for real practitioners like themselves to be able to repeat this experiment. So, they need to describe the real recipe, not just the pieces that "look good" in a paper.
We also got rid of the absurd practice of putting a limit to the number of pages in a paper. Page limintations is a medieval practice that we inherited from publising in physical paper. It has no place in the Internet age. A PDF file with 10 pages cost the same as a PDF file of 100 pages, and color figures in the Internet cost the same as gray scale figures. In the IJ we let authors use as much space as they need to describe the recipe that allow us to replicate their work. In general the size of a PDF document is negligeable compared to the size of the data and code that they are required to submit along with the paper.
Free of page limitations, and allowed and encouraged to submit source code and data, authors are finaly empowered to share the full recipe that make possible for others to replicate their work.
Another biggest issue in population neuroimaging
studies is the registration algorithm in all neuroimaging software
including SPM, FSL,
and AFNI.
We carried out a detailed evaluation on several registration algorithm. Unfortunately, statistical results are strongly influenced by
how the researchers tune their parameters in registration algorithm.
Our overall findings are that we need to report statistical results coupled with the tuning parameters.
and AFNI.
We carried out a detailed evaluation on several registration algorithm. Unfortunately, statistical results are strongly influenced by
how the researchers tune their parameters in registration algorithm.
Our overall findings are that we need to report statistical results coupled with the tuning parameters.
I'm very glad that you pointed this out.
Current publishing practices show an extreme disregard for the importance of numerical parameters. Such dismissive attitude is obviously rooted in the fact that reviewers typically do not attempt to replicate the work (because they don't have access to the data, the code, may not have the time, and are not even encouraged by the editors).
Such replication work is left "as an exercise" to graduate students that will read those papers after publication; and for which their supervisors "assume" that the poor graduate student "just" have to write the code for the algorithm described in the paper (A task that in practice requires some divination skills combined with black magic, and months of work). All in all, this illustrates how disconnected the "management" of science has become from the "communities of practice". (or in plain words: heads of labs do not quite know what the actual lab workers do and how they do it). We have too many Aristotles and not enough Galileos (Aristotles didn't needed to run experiments, he could figure out the truth just by reasoning about it...(just like most of today's reviewers), Galileo instead ran experiments and check by himself. No surprise that Aristotle got most of Physics wrong, and even failed to know the number of teeth in women).
For those who really attempt to replicate the work described in a paper, it becomes immediately obvious that those little numerical parameters are ESSENTIAL to the data processing, and unfurtunately they are rarely disclosed in a usable form. This leads authors, reviewers and readers to broad claims of "Methods X" is better than "Method Y", jumping over the fact that "Method X" can display a full range of different behaviours when you start changing the numerical parameters (typically in the order of 10 to 50 parameters). So, in practice there is no such thing as "Method X", there is only "Methods X, with parameters {a,b,c,d...} ran on data M,N...
As maintainers of ITK, we are in the business of working with the ITK community on taking published papers and converting them in usable code (tested, multiplatform, open source). In that process of translating "papers" into "real code", we have seen many skeletons in many closets..., and many emperors without clothes. The lack of full specification of parameters and the absence of the "real recipe for use" is blatant in most papers. Curiously this is particularly done when the operation is deemed to be too simple (despite of being essential). For example, in one case it took us several months to realize that the authors have taken all of their input data and renormalized their intensities before feeding it into the processing pipeline described in the paper. A trivial step, that made a huge difference in the results, but was not mentioned in the paper. Authors (in the effort of presenting their papers as "scientific" tend to overlook the pieces of the work that do not look "complicated" enough or "smart" enough, hence their attachment to anything that looks like a differential equation...)
It is great to see that progressive Journals like PLoS ONE and OCR get it !, and are requiring parameters to be included as part of the recipe for replication.
One approach that we used in the Insight Journal (http://www.insight-journal.org/)(IJ) was to include the scripts that are used to run the code, and in such scripts encode all the necessary parameters. A second sociological trick that helped a lot, is not to call these submissions "Papers" and instead call them "Technical Reports". That nuance help trigger in the mind of authors the notion that this report is intended for real practitioners like themselves to be able to repeat this experiment. So, they need to describe the real recipe, not just the pieces that "look good" in a paper.
We also got rid of the absurd practice of putting a limit to the number of pages in a paper. Page limintations is a medieval practice that we inherited from publising in physical paper. It has no place in the Internet age. A PDF file with 10 pages cost the same as a PDF file of 100 pages, and color figures in the Internet cost the same as gray scale figures. In the IJ we let authors use as much space as they need to describe the recipe that allow us to replicate their work. In general the size of a PDF document is negligeable compared to the size of the data and code that they are required to submit along with the paper.
Free of page limitations, and allowed and encouraged to submit source code and data, authors are finaly empowered to share the full recipe that make possible for others to replicate their work.
Threaded View
Title | Author | Date |
---|---|---|
Luis Ibanez | Mar 10, 2011 | |
hongtu zhu | Mar 13, 2011 | |
Luis Ibanez | Mar 13, 2011 | |
Matthew Brett | Mar 13, 2011 | |
Isaiah Norton | Mar 13, 2011 | |
Torsten Rohlfing | Mar 10, 2011 | |
Luis Ibanez | Mar 11, 2011 | |
Daniel Kimberg | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |
Torsten Rohlfing | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |
Torsten Rohlfing | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |
Torsten Rohlfing | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |
Matthew Brett | Mar 10, 2011 | |
Pierre Bellec | Mar 10, 2011 | |
Luis Ibanez | Mar 11, 2011 | |
Matthew Brett | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |
Torsten Rohlfing | Mar 10, 2011 | |
Daniel Kimberg | Mar 10, 2011 | |
Cinly Ooi | Mar 10, 2011 | |