Cluster Validation
Validation of the cluster analysis is extremely important because of its somewhat 'artsy' aspects (as opposed to more scientific).
Validation at this point is an attempt to assure the cluster analysis is generalizable to other cells (cases) in the future. The following are some different ways to do this.
Analyze Separate Samples of Cells
Here you would collect data on one group of cells (or cases of whatever you're clustering) and perform the cluster analysis. Following this you would collect more data from from more cases and perform cluster analysis on these as well. Then the profiles of these two cluster analyses could be compared (See Profiling here).
Seeing as most of us aren't likely to do this the next best thing is to randomly split the sample that you have into multiple groups (most likely 2).
To see how to do this random sampling validation in SPSS click here.
The idea here is to look and see if individual cases seem to cluster together the same way in the smaller samples as they do when all of the cases are used to perform the cluster analysis.
Here you can see that in the first dendrogram that 1, 7, 14 & 22 stay together. So do 23, 27 & 29. 2, 12, 18, 3, 8, 5, 13, 9 & 11 do as well.
In the second dendrogram 4, 17, 10, 6, 19 & 20 stay together. However 30 is added to this group as well. How much should this cause worry? Not very much, because as you may remember this cell has already been identified as being different from the cells by being GFP -ve. This reveals a problem with this in that cluster analysis (like statistical procedures) is better with a larger sample. I think this is still pretty good because the other groups stay consistent as well (I'm not listing them). Actually only 1 out of 30 cases switching groups is pretty good.
This isn't the end though.
All Cells | Random Group1 | Random Group2 | ||||||
Mean | SD | Mean | SD | Mean | SD | |||
APHW | 1.169999 | 0.063714 | 1.15126 | 0.054225 | 1.194983 | 0.078289 | ||
AHP | -3.95691 | 0.663658 | -4.06826 | 0.672651 | -3.80845 | 0.764708 | ||
Rn | 348.7486 | 29.24746 | 338.9068 | 37.50931 | 361.8709 | 1.950505 | ||
Mtau | 59.45319 | 1.169999 | 59.213 | 2.223075 | 59.77344 | 5.525621 | ||
Axis | 1.690207 | 0.694995 | 1.658925 | 0.887908 | 1.731917 | 0.511781 | ||
AMPA | 73.86707 | 4.621209 | 75.27751 | 5.12071 | 71.98649 | 3.930735 | ||
EPSC | -375.027 | 10.11532 | -373.195 | 12.1766 | -377.469 | 8.300786 | ||
APHW | 1 | 0.041259 | 1.003677 | 0.034747 | 0.963541 | 0.094942 | ||
AHP | -2.2686 | 0.689125 | -2.23252 | 0.696048 | -2.22001 | 0.728748 | ||
Rn | 423.7855 | 27.23804 | 419.0135 | 31.55001 | 423.4358 | 26.70377 | ||
Mtau | 58.03932 | 26.85176 | 57.24916 | 27.81466 | 53.64329 | 29.43713 | ||
Axis | 0.852383 | 0.47475 | 0.680997 | 0.483726 | 1.170366 | 0.360181 | ||
AMPA | 74.22653 | 5.641363 | 75.28617 | 6.28192 | 72.71064 | 4.179569 | ||
EPSC | -299.893 | 9.826026 | -298.168 | 11.50212 | -307.562 | 14.78337 | ||
APHW | 0.800001 | 0.059242 | 0.81761 | 0.086448 | 0.792325 | 0.051527 | ||
AHP | -1.89115 | 0.911236 | -1.92848 | 0.402861 | -1.93501 | 1.340819 | ||
Rn | 318.7753 | 34.37174 | 307.753 | 40.99336 | 312.1387 | 16.47762 | ||
Mtau | 20.15375 | 0.805265 | 20.42457 | 1.190181 | 19.95015 | 0.663389 | ||
Axis | 1.464665 | 0.696058 | 1.92403 | 0.598264 | 1.102363 | 0.710041 | ||
AMPA | 76.94796 | 4.892007 | 80.92229 | 4.746457 | 74.91617 | 3.83655 | ||
EPSC | -375 | 22.2961 | -382.256 | 29.14978 | -378.794 | 8.166388 |
Looking at the profiles of the cells from the different samples you can see that there isn't much difference in the means. Of course you could (and should) carry out some statistical tests to see that they are not significantly different.
You can also plot this out as seen in the descriptive profiles to verify that there are no differences.
Use Other Hierarchical Clustering Procedures
This is a bit of a tricky one seeing as part of cluster analysis is picking the right clustering algorithm. Picking another one seems to go against that logic. However, the methods are sometimes similar enough and if your data has been picked and adjusted correctly (see here for picking and adjusting data) they should come out pretty similar if the cluster differences are robust.
Apply a Non-Hierarchical Cluster Procedure
Predictive Validity
Predictive validity is to use some variable not used in your original cluster analysis that has previously been established as varying among cases.