Browse Forums

extending_nifti

6 Subscribers

extending_nifti > Extending NIFTI Discussion

Showing 1-10 of 10 posts

Extending NIFTI Discussion

Extending the NIfTI format

This forum is intended for any ideas, comments and discussion related to extending the existing NIfTI format to create a more advanced format for neuroimaging data. It is for things beyond those changes recently proposed by the NIfTI committee for the creation of a NIfTI-2 format (an intentionally small update to the NIfTI-1 format to allow 64-bit storage and addressing for large images and matrices - see www.nitrc.org/forum/forum.php?forum_id=1941). This discussion of future formats is actively encouraged to interact and overlap with similar efforts on informatics and datasharing, such as INCF datasharing (www.incf.org/core/programs/datasharing).

RE: Extending NIFTI Discussion

thank you for opening this discussion. along with larger image sizes and aggregation of data across increasingly large studies, there is a need for:

a. serializable compression and the ability to operate on a single voxel/plane time-series or a single volume of a 4D-volume.
b. extended sequence meta-data derived from different pulse-sequences (for example, multiple echo-times and flip angles for flash scans) that are useful for reconstruction algorithms
c. an extensible schema for neuroimaging data formats that's consistent across conversion from manufacturers own dicom fields

one option that addresses (a,b) is HDF5 (http://www.hdfgroup.org/HDF5/). the MINC2 (link) format uses HDF5 and is a good option to look into as a starting point.

for c, the XCEDE and XNAT schemas are steps in that direction, but we haven't had much buy-in from the community at large. however, with increasing public distribution of large scale studies, it's worthwhile to start considering/extending some of these options seriously.

RE: Extending NIFTI Discussion

Originally posted by Satrajit Ghosh:

a. serializable compression and the ability to operate on a single voxel/plane time-series or a single volume of a 4D-volume.

one option that addresses (a,b) is HDF5 (http://www.hdfgroup.org/HDF5/). the MINC2 (link) format uses HDF5 and is a good option to look into as a starting point.

Indeed, I agree with Satra that as data gets bigger and computing power increases faster than bandwidth, storing huge data to disk is going to become more and more of a bottleneck. In other words, I/O on large data files will be very problematic soon.

Two methods are usually employed to mitigate these problems. First making the data access local on the disk. This requires abandonning the hyper-cube representation for storage and saving the data as blocks, or chunks, that can be sliced locally on the disk in every direction.
Second, compression, if it is implemented with a seekable and stream-capable compression algorithm (not gzip), can render the I/O faster. Both approaches can be combinned.

Optimal implementation of the above suggestion is a lot of work. However, the HDF library has tackled these problems efficiently, and it already available in most scientific computing environment. As a matter of fact, even Mathwork, a company with huge resources, ditched their own I/O library to use HDF5 for the most recent data saving formats.

RE: Extending NIFTI Discussion

one option that addresses (a,b) is HDF5 (http://www.hdfgroup.org/HDF5/). the MINC2 (link) format uses HDF5 and is a good option to look into as a starting point.

I've been beaten to the punch!?

What also might be worth considering is that this is not a new idea
(NifTi2 == MINC2), this was mooted many moons ago at the various NifTi DFWG meetings.

There are also other advantages to HDF5 (and thus MINC) that are worth considering.

* Don't underestimate the advantages/power of block structured file formats,
on the fly compression for one. (currently bzip in MINC2)

* 'Limitless' header/metadata that can be exported/read/written as
XML thanks to HDF5's XML tools.

* Immunity to byte swapping.

* HDF5/MINC2 has something called an apparent dimension order [1],
this means you can request a volume be loaded or indeed a hyperslab
from a volume in any dimension order you choose. This is achieved via block storage.

* Big files. I have histology reconstructions (from mice) that are approaching 1TB,
things still 'just work' with this sort of approach with little memory footprint and no swapping.

Granted installing HDF5 introduces some pain on some platforms,
(CentOS where be thy HDF5 RPM!?!?!) but for something like ubuntu/debian
this has been packaged for ages.

It might be instructive to look as some of Jason Lerch's excellent tutorials on
MINC2 to give an idea of what can be done with HDF5.

Hyperslabs: http://en.wikibooks.org/wiki/MINC/Tutori...

There are also a few more here:

http://en.wikibooks.org/wiki/MINC/Tutori...

a

[1] - Apparent dimension order: http://en.wikibooks.org/wiki/MINC/Refere...

RE: Extending NIFTI Discussion

Thanks, Mark, for starting this conversation.

One thing that I think would be very useful would be support for a sparse data representation within the data part of NIfTI files - even if it's just using simple (location, value) tuples. Many images are moderately sparse, particularly when they represent the result of some kind of segmentation, and some are very sparse (FSL-TBSS skeletons are a good example). Even if the data are stored in a sparse array for processing, the intermediate step of reading in the full image data can put unnecessary demands on system memory, all for the sake of reading in and then disregarding huge numbers of zeroes.

I'd be interested in others' thoughts on whether this would be useful, too.

All the best,
Jon

RE: Extending NIFTI Discussion

the idea of supporting sparse data is really good. again, HDF5 supports this through what is called a chunked dataset (details).

in addition, for current inroads into cloud storage and computing, parallel hdf5 (detials) can be a huge benefit for imaging algorithms that can run in parallel.

RE: Extending NIFTI Discussion

Originally posted by Andrew Janke:

one option that addresses (a,b) is HDF5 (http://www.hdfgroup.org/HDF5/). the MINC2 (link) format uses HDF5 and is a good option to look into as a starting point.

For those that don't know the NIfTI history I would just like to note that the a critical requirement for NIfTI1 and now NIfTI2 is the agreement to support it from all the major analysis packages. Perhaps in the future they will be prepared to support NIfTI3?=MINC#?, but I believe there is no willingness to do so to date. As a field we seem to be much better off than we were a decade ago, now with NIfTI2 (thanks to Mark Jenkinson and many members of an extended NIfTI Cmte.) and MINC2. If you want the HDF5 advantages for now use MINC2 and I imagine there will shortly be good converters that help deal with the various shortcomings being discussed in this forum.

Store Series Time

I'm not sure whether other people has the same problem. When scanning, we collect subject responses (stimulus output) on another computer. While the scanner collect physiological data for us, it is organized by date and time. This means we have for a scan we have data from multiple sources and we have to somehow bring it together beforre we can do analysis. Our problem here is given a massive pile of scans, stimulus output and physiological data, we frequently have problems trying to match up the correct stimulus output, physiological data with the scans. Current system mainly relies on researcher to meticulously document everything had done and to keep the filenames labelling consistent and correct. Given the scanning days are always hectic, all of these goes out of the window sometime during the day if not at the beginning of the day.

My backup strategy, is to match data based on their time stamp. For example, if your DMDX file captures two subjects when you are expecting only one, you use the time information to workout which subject's data is the correct one to use, and to move the misplaced one back to its original location.

Here is where we hit a problem. NifTI does not capture the time the data was acquired. This means going back to the DICOM files and start fishing for (0008,0031) Series Time. For us, it would be useful to capture this time as part of the NifTI header.

Perhaps it is bad management practise on our part, but our researchers normally do not write down the time the start scanning for particular task. Normally, the problem is actually to try to decide which two stimulus output files collected 6 minutes apart is the one to use.

Therefore, if future NifTI captures series time, it will be very useful for us. The series time need not be machine readable for our purpose.

Store Series Time

Hi Cinly,

My initial (somewhat trite) response to this is that if the format you
are using doesn't meet your needs, you're probably using the wrong
format... :)

Perhaps you could stuff all the DICOM tags you need into a Nifti
Comment field? In our case when converting from DICOM to MINC we
coalesce all of the DICOM header and just stuff it all into the MINC
header. Much easier than trying to figure out what we might want later
on down the track.

a

RE: Extending NIFTI Discussion

Dear all,

There doesn't seem to have been much activity here for a while, but I thought it might be worth following up on my earlier post, as I have since had further discussions with colleagues to get an idea of what would be useful in a new file format. On my wish-list would be:

- A mechanism for efficient storage for sparse data (as mentioned previously). Many images are sparse enough that nothing more complex than an option for coordinate/value storage would be needed to improve efficiency significantly.
- The ability to arrange voxel ordering so that spatial dimensions appear last rather than first. This would make voxelwise analysis of multivariate data (fMRI time series, diffusion directions, etc.) much more efficient, particularly for compressed files.
- Simpler conventions for extended metadata and orientation encoding, and fewer data types. Even NIfTI-1 seems more complex than it needs to be, which makes implementation more painful. As far as I know, NIfTI extensions are relatively rarely used (but correct me if I'm wrong!).

I've drafted an illustrative file format proposal at https://github.com/jonclayden/miff, which caters for all of these things, to illustrate further what I mean. Metadata is free-text, with allowance for the use of DICOM tag codes for standard acquisition parameters (TE, TR, etc.).

Incidentally, is there any further news on NIfTI-2? Can we follow the state of things anywhere?

All the best,
Jon