help > multiple imputation only for one subject
Showing 1-2 of 2 posts
Apr 8, 2018 10:04 AM | noamagal
multiple imputation only for one subject
hello!
i have a data-set of 99 subjects. one of my subjects has missing data only in one region of interest - the right frontal superior orbital cortex. i want to fill in his missing data and my question is: in this case, should i use MI or is it negligible in this case? should i use mean replacement instead?
thank you from advance,
noa
i have a data-set of 99 subjects. one of my subjects has missing data only in one region of interest - the right frontal superior orbital cortex. i want to fill in his missing data and my question is: in this case, should i use MI or is it negligible in this case? should i use mean replacement instead?
thank you from advance,
noa
Apr 9, 2018 04:04 PM | Kenneth Vaden - Medical University of South Carolina
RE: multiple imputation only for one subject
Hi Noa,
The best way to deal with missing data is likely to depend on the specifics of the case. Practically speaking, if you are only missing a single ROI from a single subject in a sample that large (N=99 subjects), then I would not expect to see a difference based on ignoring that missing value. There are clearly situations where that is not a good option, though.
Multiple imputation can be advantageous compared to ignoring one missing value, or replacing it with a mean value. Ignoring a single missing value (i.e., analyzing only the data that were observed) will decrease your statistical sensitivity (albeit slightly), raising your likelihood of a Type II error. Replacing a value with the mean observed value increases the Type I error rate, because the variance of the data submitted to statistic tests is lowered by mean replacement. In other words, you can systematically increase the False Positive Rate by replacing a missing value with the mean. Note that the cause of missing data matters -- if a subject is missing data because of a lesion/stroke or normalization errors aligning an ROI to their image data, then those data could be considered "missing not at random". In those cases, multiple imputation is more likely to produce inaccurate results b/c the missingness is determined by the data itself (very low/nonexistent values).
I don't know enough about the missingness mechanism or data organization for your specific analysis to make a recommendation, although a biostatistician in your organization may be able to assist you. If you have access to statistical expertise, definitely discuss the details with them. We published a paper on multiple imputation of fMRI data for whole brain analyses in NeuroImage (info below), which contains many helpful references that apply to a range of data types and missingness scenarios. A common situation for ROI analyses would involve a single value from each ROI and subject, which may be more directly dealt with using the "MICE" R-package, rather than performing multiple imputation with voxel-level data (the purpose of our NITRC package).
Good luck!
Kenny Vaden
Vaden, KI, Gebregziabher, M, Kuchinsky, SE, Eckert, MA (2012). Multiple imputation of missing fMRI data in whole brain analysis. NeuroImage, 60(3)m 1843-1855.
The best way to deal with missing data is likely to depend on the specifics of the case. Practically speaking, if you are only missing a single ROI from a single subject in a sample that large (N=99 subjects), then I would not expect to see a difference based on ignoring that missing value. There are clearly situations where that is not a good option, though.
Multiple imputation can be advantageous compared to ignoring one missing value, or replacing it with a mean value. Ignoring a single missing value (i.e., analyzing only the data that were observed) will decrease your statistical sensitivity (albeit slightly), raising your likelihood of a Type II error. Replacing a value with the mean observed value increases the Type I error rate, because the variance of the data submitted to statistic tests is lowered by mean replacement. In other words, you can systematically increase the False Positive Rate by replacing a missing value with the mean. Note that the cause of missing data matters -- if a subject is missing data because of a lesion/stroke or normalization errors aligning an ROI to their image data, then those data could be considered "missing not at random". In those cases, multiple imputation is more likely to produce inaccurate results b/c the missingness is determined by the data itself (very low/nonexistent values).
I don't know enough about the missingness mechanism or data organization for your specific analysis to make a recommendation, although a biostatistician in your organization may be able to assist you. If you have access to statistical expertise, definitely discuss the details with them. We published a paper on multiple imputation of fMRI data for whole brain analyses in NeuroImage (info below), which contains many helpful references that apply to a range of data types and missingness scenarios. A common situation for ROI analyses would involve a single value from each ROI and subject, which may be more directly dealt with using the "MICE" R-package, rather than performing multiple imputation with voxel-level data (the purpose of our NITRC package).
Good luck!
Kenny Vaden
Vaden, KI, Gebregziabher, M, Kuchinsky, SE, Eckert, MA (2012). Multiple imputation of missing fMRI data in whole brain analysis. NeuroImage, 60(3)m 1843-1855.