SPIDER: CL KM (Classification

CL KM - Classification - K Means ||

(3/21/13)

PURPOSE: Performs K-means clustering on factors (or raw data) produced by CORAN or PCA. Produces class membership document file and class selection files.
See: Classification and Clustering Tutorial for further info. Example.

SEE ALSO

CL CLA	[Classification - Automatic]
CL HC	[Classification - Controlled]

USAGE

.OPERATION: CL KM [trb],[trw],[c],[h],[db]

.CORAN/PCA FILE (e.g. CORAN_01_IMC) FILE: coran_t_IMC
[Enter name of the raw image data sequential file (_SEQ), image factor coordinate file (_IMC), or pixel factor coordinate file (_PIX) file containing your data. These files are created by :'CA S'.

.NUMBER OF CLASSES: 40
[Enter number of classes required.]

.FACTOR NUMBERS: 1, 3, 4, 6
[Enter the factors to be included in the K-means clustering algorithm.]

.FACTOR WEIGHT: 1.5
[Enter a weight for each factor selected. If the answer zero is given at any point, all weights from the current factor onwards are set to one. This question is asked as many times as the number of factors specified, or is terminated by the answer zero.]

.FACTOR WEIGHT: 0.0
[This question is asked as many times as the number of remaining factors, or is terminated by the answer zero.]

.FOR RANDOM SEEDS GIVE NON-ZERO STARTING NUMBER: 1457
[Initial partition of objects is random. If the answer is zero, the partition is as follows: 1st object to first class, 2nd object to second class, ..., k-th object to k-th class, (k+1)-th object to first class, etc. For non-zero answer, the number is used to initialize a truly random assignment of objects. The purpose is to try different initial partitions for a given number of classes and choose the one with the best value of one of the criteria.]

.SELECTION DOC FILE TEMPLATE (e.g.: SEL***): SEL***
[Enter template for selection document files which list all the objects (usually images) belonging to the same class. One file will be created for each class.]

.CLASS MEMBERSHIP DOC FILE: MAP001
[Enter the document file name where the class membership for each object will be stored. File lists image number and class number.]

NOTES

At end of operation the optional register variables on the operation line receive the following values:
[trb]: Tr(B), trace of between-groups sum of squares matrix,
[trw]: Tr(W), trace of within-groups sum of squares matrix,
[c]: C = Tr(B)*Tr(W), Coleman criterion,
[h]: H = (Tr(B)/(k-1))/(Tr(W)/(nobj-k)), Harabasz criterion (k = number of groups; nobj = number of objects [images]),
[db]: DB, Davies-Bouldin criterion.
The local maximum on the plot of C or H versus number of groups indicates the 'best' partition.
A large change in value of Tr(W) also indicates a possible good partition.
The local minimum on the plot of DB versus number of groups indicates the 'best' partition.
Davies-Bouldin's is the most highly recommended criterion.
For description of the K-means algorithm and clustering criteria see:
Cluster Analysis Algorithms for Data Reduction and Classification of Objects. Helmuth Spath. (John Wiley & Sons, Ellis Horwood Ltd., 1980).
Algorithms for Clustering Data. A.K.Jain, R.C.Dubes. (Prentice Hall, 1988).
A cluster separation measure. D.L.Davies and D.Bouldin. (1979) IEEE Trans. Pattern Analysis and Machine Intelligence 1:224-227.
If there is only a single object in a class then the results may be nonsense. To overcome this remove the object (image) that occurs singly and rerun the analysis.

SUBROUTINES: SUBKMNS, SUBKMEANS, NEWKMEANS

CALLER: UTIL1