devel > compatability testing
Showing 1-3 of 3 posts
Nov 1, 2010 01:11 PM | Richard Reynolds
compatability testing
Hi all,
Nick, that's a great start on the test plan document!
...
I wonder a bit about the goal of dataset duplication. For a package
to be compliant, one would expect that it could read and understand
a valid dataset, and that the datasets it produces are valid, as well.
Is it really necessary that they be able to essentially duplicate an
input dataset?
For example, consider the input storage method. A compliant package
should be able to read datasets in ASCII, binary base64, or the gzip
version. But I would expect most packages to have a default output
format. Do we force everyone to have some option to write the output
using the same storage mode as the input?
That applies to data order (both row/column major and endian) and
data types as well. Does every package store 8, 16, 32-bit ints
and floats as read? I would expect many to promote to floats, for
example (though 32-bit ints to not "promote" to floats). And I
cannot see most packages bothering to write in a non-native endian.
My opinion is no, I do not see the need for such a requirement. The
written dataset should be able to appropriately represent the input
one, but we should not force packages to write exactly as was read.
One conceptual note here is the purpose of writing a dataset. There
is little need to copy a dataset, since the 'cp' command does that
well. What is written is by design of the applied program, and it
generally means altering a dataset for some purpose, or writing anew.
Relaxing this restriction makes compliance testing harder, but it
seems fair to me.
We could consider allowing the following to vary:
- encoding
- data order
- endian
- data types (allow conversion to float? signed int? unsigned?)
Note that if we allow these things to vary, maybe the I/O library
needs the ability to account for them, i.e. have some option to
"promote to defaults". Then the datasets could be compared after
that (particularly necessary for data order and data type, and
endian is already done).
Any thoughts on this?
I'll continue with thoughts on metadata separately.
- rick
Nick, that's a great start on the test plan document!
...
I wonder a bit about the goal of dataset duplication. For a package
to be compliant, one would expect that it could read and understand
a valid dataset, and that the datasets it produces are valid, as well.
Is it really necessary that they be able to essentially duplicate an
input dataset?
For example, consider the input storage method. A compliant package
should be able to read datasets in ASCII, binary base64, or the gzip
version. But I would expect most packages to have a default output
format. Do we force everyone to have some option to write the output
using the same storage mode as the input?
That applies to data order (both row/column major and endian) and
data types as well. Does every package store 8, 16, 32-bit ints
and floats as read? I would expect many to promote to floats, for
example (though 32-bit ints to not "promote" to floats). And I
cannot see most packages bothering to write in a non-native endian.
My opinion is no, I do not see the need for such a requirement. The
written dataset should be able to appropriately represent the input
one, but we should not force packages to write exactly as was read.
One conceptual note here is the purpose of writing a dataset. There
is little need to copy a dataset, since the 'cp' command does that
well. What is written is by design of the applied program, and it
generally means altering a dataset for some purpose, or writing anew.
Relaxing this restriction makes compliance testing harder, but it
seems fair to me.
We could consider allowing the following to vary:
- encoding
- data order
- endian
- data types (allow conversion to float? signed int? unsigned?)
Note that if we allow these things to vary, maybe the I/O library
needs the ability to account for them, i.e. have some option to
"promote to defaults". Then the datasets could be compared after
that (particularly necessary for data order and data type, and
endian is already done).
Any thoughts on this?
I'll continue with thoughts on metadata separately.
- rick
Dec 14, 2010 09:12 PM | Nick Schmansky
RE: compatability testing
Hi,
I agree with everything you say. My initial attempt started from a 'path of least resistance' conceptual framework, in terms of keeping the testing as simple as possible, with the intent to make changes as encountered. This did not last long as I encountered when running the test plan script for freesurfer the very problems you describe: freesurfer has default data types (base64 gzipped), and doesnt track the format read as input (so cant easily 'echo' it as output).
So I will modify the test plan doc to describe the allowable variances you mention.
Nick
I agree with everything you say. My initial attempt started from a 'path of least resistance' conceptual framework, in terms of keeping the testing as simple as possible, with the intent to make changes as encountered. This did not last long as I encountered when running the test plan script for freesurfer the very problems you describe: freesurfer has default data types (base64 gzipped), and doesnt track the format read as input (so cant easily 'echo' it as output).
So I will modify the test plan doc to describe the allowable variances you mention.
Nick
Dec 14, 2010 10:12 PM | Richard Reynolds
RE: compatability testing
That sounds good, thanks!
I'll have to ponder some updates to gifti_tool to allow for certain fields to vary. The data type might be more work. I would not want to just blindly promote things to double (though that would be the easiest route). Perhaps promotion (or demotion) to float would be simple enough, without wasting toooo much memory.
- rick
I'll have to ponder some updates to gifti_tool to allow for certain fields to vary. The data type might be more work. I would not want to just blindly promote things to double (though that would be the easiest route). Perhaps promotion (or demotion) to float would be simple enough, without wasting toooo much memory.
- rick