AUTOMATIC PARTICLE PICKING FROM ELECTRON MICROGRAPHS

Automatic particle picking from electron micrographs based on textural methods entails three basic steps:

In the PREPARATION step, preprocessing is done which basically involves (i) windowing out the micrograph into dimensions divisible by 4, (ii) inverting the contrast to have bright particles on dark background (not necessary in the case of negative stain pictures), (iii) reducing the micrograph size four-fold (which makes the program go faster), (iv) low-pass filtering the micrograph with a gaussian function, and (v) performing a peak search to find peaks above a certain threshold.

In the TRAINING step, the processing involves:. (i) creation of small images of data windows from the original unreduced micrograph using the document file obtained from the peak search routine above. This is done by the SPIDER operation 'AT WN' (ii) manual selection and initial assignment of these data windows into three categories (1: particle, 2: noise, and 3: junk) using the CATEGORIZE option in WEB (iii) feature evaluation and discriminant analysis using the SPIDER "AT SA" operation which creates a discriminant function which forms the hypothesis for future selections of particles.

In the SELECTION step one uses the SPIDER operation 'AT WN' again but this time we use the discriminant function created in the TRAINING step to window out only the genuine particles as decided by the function.

PRACTICAL CONSIDERATIONS

1. Before reducing it four-fold it is important that the input micrograph should have dimensions which are multiples of four i.e., 3200 but not 3201.In the event of the input micrograph having dimensions of 3201 or so, window it out to dimensions of four. The reason for windowing it out to dimensions of four is that when one reduces by a factor of four, the operation "IP" adds sixteen neighbouring pixels so that the SNR is high.

2. For the low-pass gaussian filtration a guideline to choosing the filter radius used is calculated by the formula: 1/(pi*size) where size refers to the approximate size of the particle in the reduced image. For example in the case of 70S ribosome: particle size is about 250 Angstroms.
If the pixel size is 5 Angstroms, then the number of pixels occupied by the particle is 250/5 = 50 pixels.
In the reduced image (reduced by four times) the particle size is about: 50/4 = 12.5 pixels. So: 1/pi*size = 0.025
The radius chosen as such for the filtration is typically too strong. But it is a starting point. Eventually we used a filter radius of 0.05. IT looks like this radius should work for all practical purposes.

For the gaussian filtration, one needs to use 'FQ NP'. operation which performs filtration in fourier space and involves the mixed radix fourier transforms. ONe just needs to check that the dimensions of the image to be filtered are allowed for mixed radix F.T. (check FT MR to determine what dimensions are not allowed).

3. The peak search routine is called 'AT PK'. It does a peak search on the filtered image over a specified neighborhood. i.e., it checks for peaks over a region which is what would be occupied by the particle. This helps in avoiding selection of the same particle more than once and also somewhat in picking two particles that might be too close together. Since the filtered image is now the reduced image, the approximate size of the particle calculated is about 12.5 pixels (from 2). For the peak search the neighborhood chosen is an ODD NUMBER and so we could use the nearest higher number which is 13 pixels. When one is using the peak search routine to obtain peaks (which shall be eventually windowed out by the 'AT WN' operation to obtain data windows) to be used in the TRAINING part of the program, it is convenient to have a low threshold value (which is the cut-off value and only peaks with values above it are selected) something like:0.6 or 0.65.If one chooses a decent threshold like 0.8 or 0.75 then there are not enough noise data windows and the discriminant program might crash.Thus one ensures that enough particles, noise and junk (clumps or stains, or blobs) data windows are available for analysis by the discriminant program. The discriminant program will crash if you have too little data windows of one group. In the second run or SELECTION part of the program it is convenient to use threshold values of something like 0.7 or 0.75 or even 0.8.This way one can avoid the unnecessary "noise" windows.

4. 'AT SA' operation is the one which evaluates the features for the different data windows belonging to the different categories i.e., particles, noise, and junk. It inputs them as feature vectors into the discriminant analysis program which casts all the relevant information into a discriminant function which is used for future classifications. Currently, the program assumes that the categories assigned from 'CATEGORIZE' in WEB fall into the following groups i.e.,

The 'AT SA' operation creates the discriminant function called "DISCRIM" which is the one used for future classifications. It reclassifies the input images assigned by the user by using the DISCRIM function that it has created and lets you know how it performs like how efficient it is. THis lets you make any changes that you might want to make in your initial asssignments.

5. 'AT WN' is used as part of the SELECTION part as well as the PREPARATION part of the program. In the preparation part of the program it just uses the peaks obtained from 'AT PK' (with a low threshold value) and windows them out (all of them) to obtain data windows to be processed by ATSA. 'AT WN' multiplies the coordinates from the document file by A MULTIPLICATION FACTOR(since the image is reduced by a factor of four) before windowing them out.

In the selection part of the program, it basically asks for the document file created from the peak search routine (with a higher threshold value) and windows out only the genuine particles as decided by discriminant function. Remember to do the peak search for the filtered image using a higher threshold like 0.7 or 0.75 or even 0.8 (during the selection part) just to exclude some "noise" windows at that stage itself.

Comments: The particle picking program seems to be quite efficient in excluding "noise" windows from the final selection. As for "junk" one needs to make a compromise i.e., If one has a lot of stains or blobs (in the case of frozen-hydrated specimens) in the "junk" category, then the program is very efficient in removing these from the final selection for the micrographs. If one has more or less equal number of stains or blobs and some particle aggregates or clumps then, the program tends to assign some of the clumps back into the particle category and at times it messes up. Personally, I think the program does a wonderful job if one has only stains or blobs in the "junk" category as one can decide for himself/herself later on (inevitably one has to do some manual screening after all the particles are windowed out to make sure that the particles selected meet the requirements to perform a 3D reconstruction) whether they consider a data window to contain a single particle or a clump.

The protocol runs pretty quickly. It takes more time only the first time when the discriminant function has to be created but in subsequent runs when one only wants to window out particles, it runs pretty quickly.

***********************************************************************

SPIDER OPERATIONS RELATED TO AUTOMATIC PARTICLE PICKING ARE:

  'AT PK'
  'AT SA'
  'AT WN'
  'AT IT'

				EXAMPLES
*****************************************************************
;
; TRAINING PART OF THE PROGRAM IN SPIDER  
;
;
; INITIAL WINDOWING FOR MANUAL ASSIGNMENT OF CATEGORIES BY CATEGORIZE OPTION IN WEB
;
; mic001 is the input micrograph (from procedure job above) which has bright 
; particles on dark background. You answer yes (y) when it asks you if you
; want to create windows and then quit. 
; doc001 is the document file with peaks from above.
; 75,75 is the dimension of the data window that you want to have for the 
; windowed particles.
; init corresponds to the directory where you want to put your created windows 
; which are called as prt****
;
;
; Run pick-at.spi as indicated in:
;     
;     Three Dimensional Reconstruction of Single Particle Specimens using Reference Projections
;
; One needs to use CATEGORIZE option in WEB now which displays the windowed 
; images above as a montage and then you can assign categories to the windows:
; REMEMBER: 1: Particle 2: Noise and 3: Junk
; The CATEGORIZE option creates a document file which contins two columns after 
; the key the first column contains the image number and the second column 
; contains the category.
; In the process of assigning categories to the images one might assign some 
; number to a particular image but later on decide to change it. So in that 
; event, one can always go go and assign a new category to that image because 
; the following operation 'AT IT' takes care of such instances. It keeps in record 
; only the last assignment given to the image. Moreover, one might tend to assign 
; a category to image #500 first and then to image # a haphazard fashion. This 
; is also sorted by the following program which eventually writes out a document 
; file which will put all unique images selected by CATEGORIZE in a sequential 
; ascending order and also makes the key (column 0) in a sequential order which 
; is not done by CATEGORIZE.
; 
; catg001 is the file created by CATEGORIZE option.
; 'AT IT' writes the output into file: sel001
AT IT
  catg001
  sel001
;
;
;
; FEATURE EVALUATION AND DISCRIMINANT ANALYSIS
;
; 'AT SA' operation requires the input data windows which have already been assigned 
; with categories. It also requires a category selection file obtained from 'AT IT'
; and then it writes the output results to a file, for which a name has to be 
; supplied. This results file basically gives you a lot of information regarding 
; how the program performs when it does a reclassification of the  already assigned
; data windows and so one can always add more or remove some windows to/from a 
; particular group. 'AT SA' puts the discriminant function in a file which can be used 
; for future purposes. 
; In the 'AT WN' operation this function file should be input. 
;
AT SA
  init/prt****
  5,5
  sel001
  analysis
  discrim

; Read "analysis" file.  Sometimes the program fails
; the file will contain information that the determinant
; of the covariance matrix was negative.  In such a case one should
; try to change "mini window size" and/or increase number of
; particles in training classes.
;
;
; Run pick-at.spi as indicated in 
;     mr1.html  
;   again for another micrograph and store the windows as: prt/prt****
;
;
AT WN
  prt/prt****
  1-1200
  5, 5
  discrim
  good001
; Document file good001 will contain numbers of particles cosidered good
;  (belonging to the first category).
;
; Ref: "Automatic Particle Picking from Electron Micrographs" by K.Ramani 
;       Lata, P. Penczek and J. Frank.

Source: autopartpcik.html     Updated: 11/03/00