CA S - Correspondence Analysis - Solution

(3/21/13)

PURPOSE

Execute correspondence (CORAN) analysis, or Principal Component Analysis (PCA) on image data.
See: Classification and Clustering Tutorial for further info.   Example.

SEE ALSO

CA NOISE [Correspondence Analysis - Create Eigenvalue doc file for noise]
CA SM [Correspondence Analysis - Show Map & Eigen Values]
CA SR [Correspondence Analysis - Reconstitute images]
CA SRA [Correspondence Analysis - Arbitrary image reconstitution]
CA SRD [Correspondence Analysis - Reconstitute Differential images]
CA SRE [Correspondence Analysis - Reconstitution of eigenimages]
CA SRI [Correspondence Analysis - Reconstitute Importance images]
CA VIS [Correspondence Analysis - Create Visual map]
SD C [Save Document - from CA S (_IMC) file]

USAGE

.OPERATION: CA S

.IMAGE FILE TEMPLATE : SEC***
[Enter a file name template identifying the image series to be analyzed.]

.FILE NUMBERS OR SELECTION DOC. FILE NAME 1-40,45,50-70
[Enter numbers of image files in the series. If the images are supplied as a 3D stack, then the numbers are interpreted as slice numbers.]

.MASK FILE: MAS002
[Enter the name of the file containing a mask. Only image pixels where mask pixels are greater than 0.5 are analyzed in CORAN or PCA. Enter '*' if no masking is desired.]

.NUMBER OF FACTORS: 20
[Enter the number of factors to be used. Note that eigenvectors and eigenvalues are only computed up to this number.]

.CORAN, PCA, ITERATIVE PCA, OR SKIP ANALYSIS (C/P/I/S): C
[Choose type of analysis
C:   CORAN.,
P:   Principal Component Analysis.
I:   Iterative Principal Component Analysis.,
S:   Skip analysis and just create _SEQ file.]

If CORAN is used the following question appears:

.ADDITIVE CONSTANT: 0.0
[CORAN analysis can not accept images containing negative values. If images contain negative values this can be overcome by specifying an additive constant that will be added to all (within-mask) pixels of all images].

.OUTPUT FILE PREFIX: coran
[Enter the prefix used for the output files where data are to be stored. 'CA S' creates five or six files to store the results. In these files the variable definitions are:

   NUMIM = Number of images
   NPIX = Number of pixels under mask
   NFAC = Number of factors
   NSAM = Image x dimension
   NROW = Image y dimension
   PCA = 1 if PCA, 0 if CORAN
   FIM = Original image number
   FPIX = Pixel number
   ACTIV = Active image flag
   TRACE = Matrix trace
   FDUM = Unused value
   N = Number of values

The files are:

PREFIX_IMC:   Text file with image map coordinates.

   NUMIM, NFAC, NSAM, NROW, NUMIM, PCA
   IMAGE(1) COORDINATES (1..NFAC), WEIGHTP(1), DOR, FIM(1), ACTIVE
   IMAGE(2) COORDINATES (1..NFAC), WEIGHTP(2), DOR, FIM(2), ACTIVE
   xx
   IMAGE(NUMIM) COORDINATES (1..NFAC), WEIGHTP(NUMIM), DOR, FIM(NUMIM), ACTIVE

PREFIX_PIX:   Text file with pixel map coordinates.

   NPIX, NFAC, NSAM , NROW , NUMIM, PCA
   PIXEL(1) COORDINATES(1..NFAC), WEIGHTP(1), CO(1), FPIX, FDUM
   PIXEL(2) COORDINATES(1..NFAC), WEIGHTP(2), CO(2), FPIX, FDUM
   xx
   PIXEL(NPIX) COORDINATES(1..NFAC), WEIGHTP(NPIX),CO(NPIX), FPIX, FDUM

PREFIX_EIG:   Text file with eigenvalues.

   NFAC, TOTAL WEIGHT, TRACE, PCA, N
   EIGENVALUE(1), %, CUMULATIVE %
   EIGENVALUE(2), %, CUMULATIVE %
   xx
   EIGENVALUE(NFAC), %, CUMULATIVE %
   IF (PCA)
     IMAGE OR PIXEL AVERAGES (1..10)
     IMAGE OR PIXEL AVERAGES (11..20)
     xx

PREFIX_SEQ:   Unformatted sequential file having image values under the mask. This file decreases memory requirements.

   NUMIM, NPIX
   IMAGE(1) VALUES(1...NPIX), FIM(1)
   IMAGE(2) VALUES(1...NPIX), FIM(2)
   xx
   IMAGE(NUMIM) VALUES(1...NPIX), FIM(NUMIM)

PREFIX_SET:   Transposed direct access file having image values under the mask. This file decreases memory requirements. Only created if transposition occurs.

   PIXEL(1) VALUES(1...NUMIM)
   PIXEL(2) VALUES(1...NUMIM)
   xx
   PIXEL(NPIX) VALUES(1...NUMIM)

PREFIX_MAS:   Mask FILE in SPIDER image format

NOTES

  1. In general, it is advisable to request a large number of factors (even larger than number of pixels under the mask, the system will automatically limit the number of factors to the permitted number).

  2. WARNING: for very large problems (the size of covariance matrix in the order of thousands) the methods used for CORAN and PCA analysis are slow and inaccurate, the system may fail on numerical accuracy or enter an endless loop. In these cases use 'Iterative PCA analysis' instead. This same strategy may be useful if you get an error message: *** ERROR: DIAGONALIZATION FAILURE when using CORAN.

SUBROUTINES

JPMSK1, SCORAN3, SPCA3, INCOR3, INCORT, GETCOO, GETCOOT, VPROP, FILELIST

CALLER

UTIL1