NITRC: masked ICA (mICA) Toolbox: open-discussion

NITRC R

Neuroimaging Resources Registry

NITRC I R

Neuroimaging Data Repository

NITRC C E

Cloud Computing Environment

| Register | Help

Browse Forums

open-discussion

8 Subscribers

open-discussion > Reproducibility analysis mICA

Showing 1-10 of 10 posts

Reproducibility analysis mICA

Hi,

I have performed a masked ICA using melodic with the HCP dataset outside of the mICA toolbox, but I would like to have some more information regarding the splithalf.py script please.

Could you please tell me what input the script requires? I have attempted to look for this in the documentation but this mostly describes the use of the GUI.

I have previously provided the 4d melodic IC file without success. An example would be greatly appreciated!

Thanks very much,

Cheers,

Will

RE: Reproducibility analysis mICA

Hi Will,

the python scripts were not made to be used outside of the toolbox. Sorry for the limited documentation.
splithalf.py only generates the folder structure for subsequent reproducibility analysis. You can execute it using

python $mICA_FOLDER/py/splithalf.py list_filename permutations out_prefix

where "list_filename" is a list of the 4D fMRI data you want to use in your mICA, "permutations" defines the number of samplings and out_prefix sets the name of the folder.

The script that does all the real work (cross-correlation matrix -> hungarian sorting -> mean reproducibility -> export results as text table and graphical plot) is ic_corr.py. You invoke it like this

python $mICA_FOLDER/py/ic_corr.r in_prefix samples dims [just_read]

Here, "in_prefix" is the name of the folder, whose sub-folders should be organized like this:

in_prefix/sample_0001/group1/dim1
...
in_prefix/sample_0001/group1/dimMAX
in_prefix/sample_0001/group2/dim1
...
in_prefix/sample_0001/group2/dimMAX
in_prefix/sample_0002/group1/dim1
...
...
in_prefix/sample_MAX/group2/dimMAX

"samples" refers to the number of samplings ("permutations" from above).
"dims" defines a range of ICA dimensionalities that you want to include in the analysis. Of course they all need to be there. Input can be e.g. "5-100".
"[just_read]" 0 is default here. You can use 1 if you have previously calculated cross-correlation files. It will then skip fsl_cc.

Hope the helps,

Florian

P.S.: If you are using MELODIC, i propose to re-run your mICAs using the toolbox. It is much easier and offers extra functionality.

Originally posted by Will Khan:

Hi,

I have performed a masked ICA using melodic with the HCP dataset outside of the mICA toolbox, but I would like to have some more information regarding the splithalf.py script please.

Could you please tell me what input the script requires? I have attempted to look for this in the documentation but this mostly describes the use of the GUI.

I have previously provided the 4d melodic IC file without success. An example would be greatly appreciated!

Thanks very much,

Cheers,

Will

RE: Reproducibility analysis mICA

Hi Florian,

Thanks so much for your response - That was very helpful!

Could you also please tell me if it is possible to use mICA entirely through the command line?

Some in-house clusters running in Ubuntu do not have the relevant X11 dependencies/libraries installed for use of the GUI so the command line would be preferable.

Cheers! :)

Will

RE: Reproducibility analysis mICA

Hello Will,

it is possible to use mICA scripts through command line. However, some checks of input correctness might not be performed as they are implemented in the GUI. In addition, you might have to write a wrapper script to automatize the analysis.

For preprocessing, group/single-subject mICA and back-reconstruction you can use the bash scripts in the bin folder, which are documented.

For reproducibility, you have to prepare the randomly splitted groups using splithalf.py
Usage: python splithalf.py list_filename permutations out_prefix
list_filename: path of a text file containing all preprocessed input data (list of nii files)
permutations: is the number of permutation as in the GUI (0 for test-retest)
out_prefix: path of output folder

step 2: is using mica (in bin directory) to perform mICA for each of sampled groups

step 3: calculate correlation using python ic_corr.py in_prefix samples dims [just_read]
in_prefix: path of the folder that contains samples folders (out_prefix of the previous step)
samples: number of samplings
dims: dimensionality range
[just_read]: 0 is default. 1 will not perform fsl_cc and just reads any previously calculated cross-correlation files.

let me know if you need more help.

Greetings,
Tawfik

RE: Reproducibility analysis mICA

Hi Tawfik,

Thank you for your suggestions.

In the mICA/bin folder I can locate the mica script which has the following options to perform ICA.

Perform masked-ICA

Usage: mica [options]
input: a text file containing a list of all preprocessed functional images (nifti)
Options:
-mask mask file (nifti)
-dim ICA decomposition dimensions. Either one value, comma separated values (e.g. 20,30,40), or a range (e.g. 2-100). If more than one value was entered, the analysis would be repeated for each parameter value.
-merge-stats merge MELODIC mixture model stats in one 4D nifit files
-parcel generate ICA-based parcellation
-Sdes design matrix across subject-domain
-Scon t-contrast matrix across subject-domain
-Tdes design matrix across time-domain
-Tcon t-contrast matrix across time-domain
-dr perform Dual-Regression. permutations value for randomize (usually 500).
-inversion perform direct Back-Reconstruction (Inversion).
-extra pass the all following command-line arguments to MELODIC

I ran the command in bash as so:

./mica $working_dir/mica_testing_list $working_dir/results/testing_mica -mask $working_dir/masks/PCC_and_Precuneus_mask_thr20_Harvard.nii.gz -dim 8 -extra -a concat --sep_vn --nobet --bgthreshold=10 --tr=1 --report --mmthresh=0.5 --Oall

and I get the following error:

./mica: line 242: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory
./mica: line 244: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory
./mica: line 245: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory
./mica: line 251: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory

Could you please advise what I am doing here? I thought I would test each script separately on a few HCP datasets before I created a master wrapper script to execute everything.

Thank you!

Cheers,

Will

Originally posted by Tawfik Moher Alsady:

Hello Will,

it is possible to use mICA scripts through command line. However, some checks of input correctness might not be performed as they are implemented in the GUI. In addition, you might have to write a wrapper script to automatize the analysis.

For preprocessing, group/single-subject mICA and back-reconstruction you can use the bash scripts in the bin folder, which are documented.

For reproducibility, you have to prepare the randomly splitted groups using splithalf.py
Usage: python splithalf.py list_filename permutations out_prefix
list_filename: path of a text file containing all preprocessed input data (list of nii files)
permutations: is the number of permutation as in the GUI (0 for test-retest)
out_prefix: path of output folder

step 2: is using mica (in bin directory) to perform mICA for each of sampled groups

step 3: calculate correlation using python ic_corr.py in_prefix samples dims [just_read]
in_prefix: path of the folder that contains samples folders (out_prefix of the previous step)
samples: number of samplings
dims: dimensionality range
[just_read]: 0 is default. 1 will not perform fsl_cc and just reads any previously calculated cross-correlation files.

let me know if you need more help.

Greetings,
Tawfik

RE: Reproducibility analysis mICA

Hi Tawfik,

Apologies for the long post! I managed to get the py/splithalf.py and mica scripts to work in the order you specified so please ignore the last post.

However, I am encountering some issues with the py/ic_corr.py script.

I run the command according to usage:

Usage: ic_corr.r in_prefix samples dims [just_read]

in_prefix path of the folder that contains samples folders
samples number of samplings
dims dimensionality range
[just_read] 0 is default. 1 will not perform fsl_cc and just reads any previously calculated cross-correlation file

This is the command:
./ic_corr.py $working_dir/results/out_50 50 2-20 0

This is the error:
Traceback (most recent call last):
File "./ic_corr.py", line 73, in
dims = dims + dim
TypeError: can only concatenate list (not "range") to list

I have a hunch this may be due to the way dimensionality is supplied to the script. I have tried using 2,20, 2-20, and 1,2,3,4,5 ... and the same error persists.

Also, silly question - does this script require that ICA is run in all dimensionalities as a prerequisite before peforming hungarian sorting?

Any advice you have would be greatly appreciated!

Thank you very much,

Cheers,

Will

Originally posted by Will Khan:

Hi Tawfik,

Thank you for your suggestions.

In the mICA/bin folder I can locate the mica script which has the following options to perform ICA.

Perform masked-ICA

Usage: mica [options]
input: a text file containing a list of all preprocessed functional images (nifti)
Options:
-mask mask file (nifti)
-dim ICA decomposition dimensions. Either one value, comma separated values (e.g. 20,30,40), or a range (e.g. 2-100). If more than one value was entered, the analysis would be repeated for each parameter value.
-merge-stats merge MELODIC mixture model stats in one 4D nifit files
-parcel generate ICA-based parcellation
-Sdes design matrix across subject-domain
-Scon t-contrast matrix across subject-domain
-Tdes design matrix across time-domain
-Tcon t-contrast matrix across time-domain
-dr perform Dual-Regression. permutations value for randomize (usually 500).
-inversion perform direct Back-Reconstruction (Inversion).
-extra pass the all following command-line arguments to MELODIC

I ran the command in bash as so:

./mica $working_dir/mica_testing_list $working_dir/results/testing_mica -mask $working_dir/masks/PCC_and_Precuneus_mask_thr20_Harvard.nii.gz -dim 8 -extra -a concat --sep_vn --nobet --bgthreshold=10 --tr=1 --report --mmthresh=0.5 --Oall

and I get the following error:

./mica: line 242: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory
./mica: line 244: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory
./mica: line 245: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory
./mica: line 251: /Volumes/ImagingData_WillKhan/HCP_Data2018/results/testing_mica/mica_log.txt: No such file or directory

Could you please advise what I am doing here? I thought I would test each script separately on a few HCP datasets before I created a master wrapper script to execute everything.

Thank you!

Cheers,

Will

Originally posted by Tawfik Moher Alsady:

Hello Will,

it is possible to use mICA scripts through command line. However, some checks of input correctness might not be performed as they are implemented in the GUI. In addition, you might have to write a wrapper script to automatize the analysis.

For preprocessing, group/single-subject mICA and back-reconstruction you can use the bash scripts in the bin folder, which are documented.

For reproducibility, you have to prepare the randomly splitted groups using splithalf.py
Usage: python splithalf.py list_filename permutations out_prefix
list_filename: path of a text file containing all preprocessed input data (list of nii files)
permutations: is the number of permutation as in the GUI (0 for test-retest)
out_prefix: path of output folder

step 2: is using mica (in bin directory) to perform mICA for each of sampled groups

step 3: calculate correlation using python ic_corr.py in_prefix samples dims [just_read]
in_prefix: path of the folder that contains samples folders (out_prefix of the previous step)
samples: number of samplings
dims: dimensionality range
[just_read]: 0 is default. 1 will not perform fsl_cc and just reads any previously calculated cross-correlation files.

let me know if you need more help.

Greetings,
Tawfik

RE: Reproducibility analysis mICA

Hi Will,

First to your question, yes you have to run mica for all subfolders generated by split-half (number of mica commands = 2 x number of dimensionalities). One can use a bash script to schedule them all using fsl_sub if your fsl setup uses parallel processing. Or call mica one by one for each folder.

About ic_corr.r, I guess you are using python 3. A small change has to be made to the script to make it compatible with python 3:
Line 72: change the following
dim=range(int(minmax[0]),int(minmax[1])+1)
to
dim=list(range(int(minmax[0]),int(minmax[1])+1))

Change line 178 and 180:
xtics=range(0,len(dims))
else:
xtics=range(0,len(dims), int(math.ceil(len(dims)/15.0)))

to:
xtics=list(range(0,len(dims)))
else:
xtics=list(range(0,len(dims), int(math.ceil(len(dims)/15.0))))

I hope it works now. If it works I will post an updated version with those changes.

RE: Reproducibility analysis mICA

Hi Tawfik,

Thanks very much for your help with this - much appreciated.

Please find a point-by-point response to your previous email:

First to your question, yes you have to run mica for all subfolders generated by split-half (number of mica commands = 2 x number of dimensionalities). One can use a bash script to schedule them all using fsl_sub if your fsl setup uses parallel processing. Or call mica one by one for each folder.

What do you mean by run mica for all subfolders generated by split-half?

Doesn't the mica script in the bin directory already do this when you supply the folder generated by split-half as the input? Could you provide an example? Thanks!

To provide some clarity, I have outlined all the steps I have run below according to your suggestions in a previous email:

STEP 1: Generate all the relevant subfolders using py/splithalf.py script:

./splithalf.py $working_dir/input_list 50 $working_dir/results/out_50

input_list is a text file of absolute paths to each .nii file like so:

/Volumes/ImagingData_WillKhan/HCP_Data2018/raw_data/100307/MNINonLinear/Results/rfMRI_REST1_LR/rfMRI_REST1_LR_hp2000_clean.nii.gz
/Volumes/ImagingData_WillKhan/HCP_Data2018/raw_data/100307/MNINonLinear/Results/rfMRI_REST1_RL/rfMRI_REST1_RL_hp2000_clean.nii.gz
/Volumes/ImagingData_WillKhan/HCP_Data2018/raw_data/100307/MNINonLinear/Results/rfMRI_REST2_LR/rfMRI_REST2_LR_hp2000_clean.nii.gz
/Volumes/ImagingData_WillKhan/HCP_Data2018/raw_data/100307/MNINonLinear/Results/rfMRI_REST2_RL/rfMRI_REST2_RL_hp2000_clean.nii.gz

STEP 2: Run group ICA on PCC mask using bin/mica script:

export MICADIR=/Users/wasimkhan/Documents/mICA_Toolbox

./mica $working_dir/input_list $working_dir/results/out_50 -mask $working_dir/masks/PCC_and_Precuneus_mask_thr20_Harvard.nii.gz -dim 2 -extra -a concat -- sep_vn --nobet --bgthreshold=10 --tr=1 --report --mmthresh=0.5 --Oall -o $working_dir/results/mica_test_2comp

Note this command was run a total of 5 times for 2, 4, 6, 8 and 10 components.

Therefore, in the $working_dir/results directory I have the following directory structure:

melodic_test_2comp....melodic_test_4comp....melodic_test_6comp and so on.

STEP 3: calculate a correlation using py/ic_corr.py script. Here I changed the relevant lines of code from your previous email and renamed the script py/ic_corr_updated.py.

Usage: ic_corr.r in_prefix samples dims [just_read]
in_prefix path of the folder that contains samples folders
samples number of samplings
dims dimensionality range
[just_read] 0 is default. 1 will not perform fsl_cc and just reads any previously calculated cross-correlation file

./ic_corr_updated.py $working_dir/results/out_50 50 2-10 0

in_prefix: I supply the folder generated by py/splithalf.py
samples: is this the number of permutations run in splithalf? I provide 50
dims: I supply 2-10 as I have already produced 5 different group ICA's (2,4,6,8,10) using the mica script .

I get this error:

File "./ic_corr_updated.py", line 102, in
if int(proc1.stdout.read().rstrip('\n')) != 1 or int(proc2.stdout.read().rstrip('\n')) != 1 or int(n1.stdout.read().rstrip('\n')) != int(dim) or int(n2.stdout.read().rstrip('\n')) != int(dim):
TypeError: a bytes-like object is required, not 'str'

line 102 reads like so:

if int(proc1.stdout.read().rstrip('\n')) != 1 or int(proc2.stdout.read().rstrip('\n')) != 1 or int(n1.stdout.read().rstrip('\n'))!= int(dim) or int(n2.stdout.read().rstrip('\n')) != int(dim):

Could you please advise if all the following steps are correct or if I have made a mistake somewhere?

Apologies for the rather long email - I wanted to be as detailed as possible so that it is easier in the debugging process :)

Thanks so much!

Cheers,

Will

Originally posted by Tawfik Moher Alsady:

Hi Will,

First to your question, yes you have to run mica for all subfolders generated by split-half (number of mica commands = 2 x number of dimensionalities). One can use a bash script to schedule them all using fsl_sub if your fsl setup uses parallel processing. Or call mica one by one for each folder.

About ic_corr.r, I guess you are using python 3. A small change has to be made to the script to make it compatible with python 3:
Line 72: change the following
dim=range(int(minmax[0]),int(minmax[1])+1)
to
dim=list(range(int(minmax[0]),int(minmax[1])+1))

Change line 178 and 180:
xtics=range(0,len(dims))
else:
xtics=range(0,len(dims), int(math.ceil(len(dims)/15.0)))

to:
xtics=list(range(0,len(dims)))
else:
xtics=list(range(0,len(dims), int(math.ceil(len(dims)/15.0))))

I hope it works now. If it works I will post an updated version with those changes.

RE: Reproducibility analysis mICA

Hi Will,

Question 1:
The split-half script generates 2*N sub-folders where N is the number of repitions. For each subfolder you have to call the mica command in order to calculate mICA for that random subgroup.
In your example, those subfolders should lie in $working_dir/results/out_50

Step 2 should be like that:

export MICADIR=/Users/wasimkhan/Documents/mICA_Toolbox

Then start the following command for each sample 1-50 and each group 1-2:

./mica $working_dir/input_list $working_dir/results/out_50/sample_XX/groupXX_input.txt $working_dir/results/out_50/sample_XX/groupXX 2,4,6,8,10 -mask $working_dir/masks/PCC_and_Precuneus_mask_thr20_Harvard.nii.gz -extra -a concat -- sep_vn --nobet --bgthreshold=10 --tr=1 --report --mmthresh=0.5 --Oall

I guess performing the previous steps as suggested here will allow py/ic_corr.py be able to calculate the reproducibilty.

Let me know if it works.

Best,
Tawfik

RE: Reproducibility analysis mICA

Hi Tawfik,

Thanks very much for your response.

I did think about this yesterday - I initially did not try it because it would be very computationally expensive, particularly when using the HCP dataset.

For example, if using the 50 repetitions generated by split half that would mean 2*50 sub-folders = 100 ICA's. I was planning to run anywhere between 2-20 dimensionalities which would mean running 1000 group ICA's. That is quite a lot!

I saw in your recent HBM paper you ran 3-50 components on 50 HCP subjects using 50 repetitions from split-half. I suspect you ran this on a colossal supercomputer - Do you remember how long it took you?

Also, would you suggest running this on a single repetition, i.e. 2*1 sub-folder = 2 * 10 ICA analyses = 20 ICA's. Do you think the reproducibility script should work?

Thanks very much,

Cheers!

Will
Originally posted by Tawfik Moher Alsady:

Hi Will,

Question 1:
The split-half script generates 2*N sub-folders where N is the number of repitions. For each subfolder you have to call the mica command in order to calculate mICA for that random subgroup.
In your example, those subfolders should lie in $working_dir/results/out_50

Step 2 should be like that:

export MICADIR=/Users/wasimkhan/Documents/mICA_Toolbox

Then start the following command for each sample 1-50 and each group 1-2:

./mica $working_dir/input_list $working_dir/results/out_50/sample_XX/groupXX_input.txt $working_dir/results/out_50/sample_XX/groupXX 2,4,6,8,10 -mask $working_dir/masks/PCC_and_Precuneus_mask_thr20_Harvard.nii.gz -extra -a concat -- sep_vn --nobet --bgthreshold=10 --tr=1 --report --mmthresh=0.5 --Oall

I guess performing the previous steps as suggested here will allow py/ic_corr.py be able to calculate the reproducibilty.

Let me know if it works.

Best,
Tawfik