Methodology of 2D particle alignment

Approaches to 2D particle alignment can be subdivided into several categories. The main division is created by the availability of a reference image, and the secondary division by the degree of variability within the data set, i.e., in how many orientations the particle is observed to lie in a micrograph.

Types of alignment problems:

One or a small known number of reference images are known or can be easily approximated, and particle orientations, i.e. the way the particle sits on a surface, are well defined (with possible small variations). This case will be referred to as Reference-based alignment.

An approximation of a reference image is known and there is only one particle orientation (with possible small variations). This case will be referred to as Refined Alignment with a reference.

Reference images are not known, but the data set can in principle be divided into a known number of homogeneous classes. This case will be referred to as Multireference classification alignment.

Reference images are not known, but the data set can in principle be divided into a known number of homogeneous classes. The particles can be centered. This case will be referred to as Rotationally invariant K-means alignment.

Reference images are not known, and there is no clear groupings in the data set. This case will be referred to as Reference-free alignment.

Reference-based alignment

We assume that a limited number of reference images are known or that a good approximation of them are available. We expect all the particles to be noisy versions of the reference, with possible small variations. In this case the alignment problem becomes a pattern matching problem. We have to place every particle in an orientation in which it will best match the reference image. In the case of multiple reference images, in addition, we have to decide which reference is the most similar one. We must also try the mirror orientation since the particle may be flipped.
We use the cross-correlation coefficient to measure the similarity between a particle and a reference.

The ref-mult-ali.spi procedure implements reference based alignment with multiple references. In this procedure alignment is done using 'AP SHC' where search for rotation is integrated with the search for translation resulting highly accurate but somewhat slow alignment determination. The operation: AP REF could be used for poorer but faster alignment determination.

Advantages of reference-based alignment:

It is very fast and robust. Since all the reference images are known, every particle can be matched independently to all of them and the correct assignment can be based on a well-defined similarity measure (the correlation coefficient).
The best alignment is found in one pass through the reference images.
Results are easily verifiable. Since the reference images are known, it can be easily verified by visual inspection whether the aligned particles are in the proper orientation and how well they match the reference images.

Disadvantages of reference-based alignment:

It relies strongly on the assumption that the particles resemble the reference image. If this assumption is not true, the average of the aligned particles will (for noisy data) look like the reference, and it is difficult to decide whether this similarity is real or is caused by enhanced noise.
If exact reference images are not known, it is difficult and time consuming to come up with good approximation of the reference.

Refined Alignment with a reference

We assume that a set of particles from one motif is available. Particles are not identical, but they share the same motif (e.g. they are all oriented on their same side on a surface). A reference image may be available or can be calculated from the sample images. The refi-ref-ali.spi procedure begins with calculation of the global average to approximate the reference, then aligns all the images using the 'AP SHC' operation, and calculates new average to obtain an improved reference. These steps are iterated a prescribed number of times.

Advantages of refined Alignment with a reference:

This procedure is simple, fast, and robust. In case of a near-homogeneous data set one can obtain high-quality alignment.

Disadvantages of refined Alignment with a reference:

The result depends on the first approximation of the reference image. By changing the way the first reference image is created one can obtain different results and it is difficult to determine which one is correct/better.
If the first reference image is not a good approximation of the "true" average or if data set contains more than one orientation the results may not be stable.

Back to the beginning

Multireference classification alignment

We assume that a very large data set is available. It comprises particles in a few distinct orientations. The data set is sufficiently large that at least some of the similar views occur in similar in-plane orientations, and so can be averaged. Thus, if we can approximately center the particles, the subsequent classification step should reveal some of the classes. These classes are used as reference images in the next multireference alignment step, classification is repeated, and new classes are formed. This procedure is iterated until stable classes are obtained.

Such a multireference classification alignment is sometimes called alignment through classification. This name reflects the idea that alignment is done separately within groups produced by the classification step.

The ref-mul-class-ali.spi procedure implements multireference alignment using 'AP SH' operation to do the alignment. This operation employs exhaustive search to find rotation and translation simultaneously. In principle it should be more accurate than using 'AP REF', but it is much slower (particularly for large number of classes). This program uses the additional procedure: centr.spi

Since multireference alignment is a general idea rather than a detailed algorithm, ref-mul-class-ali.spi constitutes a particular implementation. It should be considered a blueprint upon which one can build one's own procedure optimized for the particular data set.

It is assumed that all the windowed particles are normalized in the same way.
The following free parameters have to be decided:

- Radius for alignment and mask -- should correspond to the particle radius;
- Whether classification is done using all pixels within mask in the computation of Euclidean distance, or factors from Principal Component Analysis (PCA);
- If PCA is to be used, the number of factors has to be set;
- the number of groups into which the data set will be divided -- this determines the number of class averages that will be obtained;
- The number of times the procedure should be repeated.

The steps implemented in ref-mul-class-ali.spi:
1. All the particles are centered using centr.spi. In this procedure each particle is centered using its own rotational average as a reference, the particle is shifted, its new rotational average is formed and used as a reference, and so on, until no further shift is possible.
2. The particles are classified using k-means clustering. Depending on the flag set either the raw particles are classified or a preset number of factors from PCA are used for classification.
3. Class averages are calculated.
4. Class averages are centered using the 'CG PH' operation (phase approximation of the center of gravity).
5. Class averages are rotationally aligned using the 'AP RA' operation (reference-free rotational alignment).
6. All the particles are aligned using class averages as reference. Each particle is placed in the orientation of its most similar reference image. The alignment includes rotational alignment, shift alignment, and a check of mirrored orientation. Rotational alignment is done using the AP MD operation and is separated from the shift alignment. Shift is corrected using the most similar image (as determined by AP MD) as a reference.
7. Alignment parameters are combined with the alignment parameters obtained in the previous step and a new, aligned image series is formed.
8. Steps 2-7 are repeated a prescribed number of times.

Advantages of multireference classification alignment:

It is quite powerful. It is possible to obtain stable groups for data with very low signal-to-noise ratio (SNR). It works for data sets containing a mixture of entirely different views (an often-encountered problem, in which side views are, say, rectangular, and top views are circular).
The approach is a general idea rather than a "black-box" program; thus, it can be easily modified to the requirements of a particular data set.
There are many parameters that can be adjusted to better control the results.
Results are easily verifiable. Since the class averages are formed it can be easily verified whether the aligned particles are in the proper orientation and how well they match the averages.

Disadvantages of multireference classification alignment:

A very large data set is needed. The program depends on the initial orientation of particles, i.e., at least some of the similar views occur in similar in-plane orientations, so that meaningful averages can be formed. Statistically, this can only happen in an adequately large data set. Moreover, these averages should have a sufficiently high SNR to jumpstart the alignment, so they should each contain a sufficient number of particles.
The result is somewhat unpredictable. It is impossible in practice to verify whether rare views were revealed as classes or remained misaligned and/or misclassified.
Since the approach is a general idea rather than a well-defined procedure, the result will differ depending on the particular implementation. Thus, results obtained by different users/groups are difficult to compare.
Even if the general framework is decided upon the large number of crucial free parameters leaves the user with hard choices to make. The results will depend on the values chosen and will differ from one trial to another. The two most difficult choices are the number of clusters and number of factors for PCA. Too few clusters will conceal rare views, while too many will result in large numbers of very similar averages, or else the procedure will fail due to a too-low SNR.
The procedure is very slow.

Back to the beginning

Rotationally invariant K-means Alignment

We assume that the particles were centered and we can divide the data set into a specified number of orientation classes. In this case, operation 'AP CA' will perform classification and alignment. For each particle the rotation angle as well as the group assignment will be found. The procedure: rotkm-ali.spi demonstrates how to use 'AP CA' and how to calculate group averages.

Back to the beginning

Reference-free alignment

The rationale of the reference-free alignment is explained in the Introduction to Reference-Free Alignment. The procedure will seek such orientations of all the particles in the data set that all the possible pairs of images from this set are in the 'best' relative orientation as determined by the maximum of the CCF.

The reference-free alignment procedures were designed for very noisy data, for particles in many different orientations, and in general for cases in which a reference image is unknown or in which its usage could result in a bias and incorrect results. There are three basic operations in SPIDER that implement this strategy:
'AP SA' is a shift alignment, 'AP RA' is a rotational alignment, and 'AP SR' is a combined shift and rotational alignment.

In addition, 'AP CA' performs both classification and rotational alignment for pre-centered data. Unlike previous procedures none of these procedures checks mirrored orientations; thus, any mirror-related views will appear as two different orientations. All the alignment operations can be either used separately or as a part of longer, more elaborate alignment schemes.

The procedure: ref-free-apra-ali.spi uses 'AP RA' to rotationally align an image series and applies parameters stored by the operation in a document file to rotate all the particles. Subsequently, aligned articles are subjected to PCA and classified using hierarchical classification .

The procedure: ref-free-apsara-ali.spi alternates between 'AP SA' and 'AP RA' to align an image series both translationally and rotationally.

The procedure: ref-free-apsr-ali.spi uses operation 'AP SR' to align an image series and applies parameters stored in a document file to rotate and shift all the particles. Subsequently, aligned particles are subjected to PCA and classified using Hierarchical Classification.

Another approach to alignment uses self-correlation functions. See 'PO', 'CC P', 'AC S', 'AC NS', 'AC MSS', 'EP TM', and 'CC MS' for info on usefull operations.

Advantages of reference-free alignment:

The operation 'AP SR' is very fast and robust.
The method has very few free parameters -- essentially only the radius of the particle. The results do not depend significantly on these parameters, and there are no assumptions made about the reference, number of groups, and so on.

Disadvantages of reference-free alignment:

It is difficult to assess how well the particles were aligned. In most practical cases the program gives a nearly-optimum solution, but in some cases (particularly for mixtures of entirely different shapes, but also for very low SNR or very small data sets) it may fail. In these situations one should either use a combination of 'AP SA' and 'AP RA' (with more free parameters, and thus easier to control), or multireference alignment.

Back to the beginning

Source: align.html Last update: 21 Mar 2012