Browse Forums

help

509 Subscribers

help > Choosing number of clusters in CONN's HCA

Showing 1-1 of 1 posts

Choosing number of clusters in CONN's HCA

Hi there,

I'm curious about a very particular 2 lines of code in the conn_displayroi function (lines 3871-3872):

[nill,idxmax]=max((1:size(Z,1))'/size(Z,1) - Z(:,3)/max(Z(:,3))); 

nclusters=size(Z,1)-idxmax+1;

Z is the output from a dendrogram using complete linkage clustering. The first 2 columns are the indices of 
the clusters being split, the 3rd column is the distance metric, the number of rows is number of splits in 
the dendrogram, and the rows are sorted by the 3rd column from smallest to largest.

This code runs after HCA clustering when CONN needs to find a cutoff in the dendrogram to make discrete clusters of ROIs. To me, it looks like this is a very clever way of finding a balance between number of clusters and distance between clusters. I think it is essentially maximizing the difference between the normalized number of clusters and the normalized distance metric, thus finding the point at which increasing the number of clusters starts to have diminishing returns. Is that a correct interpretation?

My main question is - what is this strategy/algorithm called? Is there a reference for it? I would like to be able to put a name to the process by which CONN determines the number of clusters from HCA.

Thank you!

Best,

Tom from BU