Cluster Validation

Validation of the cluster analysis is extremely important because of its somewhat 'artsy' aspects (as opposed to more scientific).

Validation at this point is an attempt to assure the cluster analysis is generalizable to other cells (cases) in the future. The following are some different ways to do this.

Analyze Separate Samples of Cells

Here you would collect data on one group of cells (or cases of whatever you're clustering) and perform the cluster analysis. Following this you would collect more data from from more cases and perform cluster analysis on these as well. Then the profiles of these two cluster analyses could be compared (See Profiling here).

Seeing as most of us aren't likely to do this the next best thing is to randomly split the sample that you have into multiple groups (most likely 2).

To see how to do this random sampling validation in SPSS click here.

The idea here is to look and see if individual cases seem to cluster together the same way in the smaller samples as they do when all of the cases are used to perform the cluster analysis.

Here you can see that in the first dendrogram that 1, 7, 14 & 22 stay together. So do 23, 27 & 29. 2, 12, 18, 3, 8, 5, 13, 9 & 11 do as well.

In the second dendrogram 4, 17, 10, 6, 19 & 20 stay together. However 30 is added to this group as well. How much should this cause worry? Not very much, because as you may remember this cell has already been identified as being different from the cells by being GFP -ve. This reveals a problem with this in that cluster analysis (like statistical procedures) is better with a larger sample. I think this is still pretty good because the other groups stay consistent as well (I'm not listing them). Actually only 1 out of 30 cases switching groups is pretty good.

This isn't the end though.

	All Cells		Random Group1		Random Group2
	Mean	SD	Mean	SD	Mean	SD
APHW	1.169999	0.063714	1.15126	0.054225	1.194983	0.078289
AHP	-3.95691	0.663658	-4.06826	0.672651	-3.80845	0.764708
Rn	348.7486	29.24746	338.9068	37.50931	361.8709	1.950505
Mtau	59.45319	1.169999	59.213	2.223075	59.77344	5.525621
Axis	1.690207	0.694995	1.658925	0.887908	1.731917	0.511781
AMPA	73.86707	4.621209	75.27751	5.12071	71.98649	3.930735
EPSC	-375.027	10.11532	-373.195	12.1766	-377.469	8.300786

APHW	1	0.041259	1.003677	0.034747	0.963541	0.094942
AHP	-2.2686	0.689125	-2.23252	0.696048	-2.22001	0.728748
Rn	423.7855	27.23804	419.0135	31.55001	423.4358	26.70377
Mtau	58.03932	26.85176	57.24916	27.81466	53.64329	29.43713
Axis	0.852383	0.47475	0.680997	0.483726	1.170366	0.360181
AMPA	74.22653	5.641363	75.28617	6.28192	72.71064	4.179569
EPSC	-299.893	9.826026	-298.168	11.50212	-307.562	14.78337

APHW	0.800001	0.059242	0.81761	0.086448	0.792325	0.051527
AHP	-1.89115	0.911236	-1.92848	0.402861	-1.93501	1.340819
Rn	318.7753	34.37174	307.753	40.99336	312.1387	16.47762
Mtau	20.15375	0.805265	20.42457	1.190181	19.95015	0.663389
Axis	1.464665	0.696058	1.92403	0.598264	1.102363	0.710041
AMPA	76.94796	4.892007	80.92229	4.746457	74.91617	3.83655
EPSC	-375	22.2961	-382.256	29.14978	-378.794	8.166388

Looking at the profiles of the cells from the different samples you can see that there isn't much difference in the means. Of course you could (and should) carry out some statistical tests to see that they are not significantly different.

You can also plot this out as seen in the descriptive profiles to verify that there are no differences.

Use Other Hierarchical Clustering Procedures

This is a bit of a tricky one seeing as part of cluster analysis is picking the right clustering algorithm. Picking another one seems to go against that logic. However, the methods are sometimes similar enough and if your data has been picked and adjusted correctly (see here for picking and adjusting data) they should come out pretty similar if the cluster differences are robust.

Apply a Non-Hierarchical Cluster Procedure

Predictive Validity

Predictive validity is to use some variable not used in your original cluster analysis that has previously been established as varying among cases.