Particle-picking procedure files

Unpackage the "normal" projection-matching procedures tar archive
In the top-level directory created by unpackaging 'spiproject.tar' unpackage the supplemental, verification procedures tar archive.
If using SPIRE:

Type:

spire &

Type something, anything, under Project title.
Enter data extension under data extension. Here, I will assume .dat, so adjust accordingly.
Under Directory for this project, make sure that the current directory is entered. Sometimes the data extension is appended, thus defining a new directory.
Under Configuration file, select verify.xml, using the Browse button if necessary.
Uncheck the button Create directories and load procedure files.

makeparams.spi

OUTPUT: params

Skip makefilelist.spi -- The Python script mkfilenums.py is more general.

Get reference volume

(optional) resizevol.spi -- interpolates reference-volume

Return to top

Micrographs -- in Micrographs/ directory

Micrographs are assumed to have the file pattern mic****
To generate a SPIDER doc file containing a list of micrographs, type (substitituing the appropriate data extension):

mkfilenums ../filenums.dat mic*.dat

Shrink the micrographs

shrink.spi
PARAMETER: decimation factor (so that the micrograph fits on the screen at 1X)
INPUT: mic****
OUTPUT: sm-mic****

(Optional) Screen the micrographs using montagefromdoc.py.

Run:

montagefromdoc

The first popup window will contain hopefully reasonable settings, or you can enter filenums and the micrograph file-pattern on the command line (in that order).

Alternatively, to keep all the micrographs, in the top-level directory, copy filenums to sel_micrograph.

Return to top

CTF estimation -- in Power_Spectra/directory

powdefocus.spi (slow)

INPUT: Micrographs/mic****
OUTPUT: Power_Spectra/pw_avg***, Power_Spectra/roo***

Screen the power spectra visually using montagefromdoc.py..

Run:
```
montagefromdoc   
```
The first popup window will contain hopefully reasonable settings. If not, the input doc file is sel_micrograph and the image file pattern is Power_Spectra/pw_avg****.
To enhance the contrast more than is possible in montagefromdoc.py, you may need to view a power spectrum in JWEB.
The output selection doc file, sel_micrograph, will overwrite the original, however a backup copy will be saved.

Screen the determined defocus values.

ctfmatch.py is a nice program to display the fitting. If removing bad micrographs, name the list of remaining good micrographs sel_micrograph.

Return to top

Particle-picking -- in Particles/ directory

Get noise file, either from previous project or from noise.spi
Window particles from micrographs (slow).

pick.spi

pick.spi calls pick_p.spi.
Selecting particles from micrographs in WEB is another option.
INPUT: ../batch/Micrographs/mic**** (change from raw****), ../reference_volume, noise
OUTPUT: win/winser_****, coords/sndc****

lfc_pick.spi

lfc_pick.spi calls pickparticle.spi.
INPUT: ../batch/Micrographs/mic**** (change from raw****, ../reference_volume, noise
OUTPUT: win/winser_****, coords/sndc****

boxer/e2boxer.py in EMAN/EMAN2

After picking particles with boxer, run eman2spider.spi.
- INPUT: ../batch/Micrographs/mic****.box (e2boxer coordinate file)
- OUTPUT: coords/sndc****
- This procedure file runs Python script emancoords2spiderdoc.py.
  To run a single micrograph, use the syntax (substituting the appropriate file names/numbers):
Then, window out the images using rewindow.spi

INPUT: ../batch/Micrographs/mic****, coords/sndc****
OUTPUT: win/winser_****
This is also a useful procedure file if I want to use a different window size, or if I've deleted the winser files to free up disk space.

(Optional) Select particles from micrograph stacks.
lfc_pick.spi sorts particles by high cross-correlation to worst, and will typically lead off each micrograph with ice-condensation blobs and end with noise. You can exclude these bad images at the extremes by specifying the contiguous range that includes all of the good particles. This step will save time and disk space and will help the classification.
There are two ways to visualize the particles: using WEB or the Python utility montagefromdoc.py

Using montagefromdoc.py -- this will require less preprocessing (e.g., filtershrink.spi) but Python display is slower than WEB.

Run:
```
montagefromdoc 
```
The first popup window will contain hopefully reasonable settings, such as input coords/sndc**** and output good/good****. Alternatively, you can enter the doc-file and particle-file names on the command line (in that order), e.g.,
```
montagefromdoc coords/sndc1234 win/winser_1234
```

Using WEB

Under Options/Image turn on filenames, and you may need to use a small font, set under Options/Font. Open the stacks win/winser_****.
There used to be procedure files filtershrink.spi and negmontagedocs.spi to filter and break the stacks into a bite-sized number of images, respectively, but I haven't updated them since lfc_pick started using stacks. Contact me if you would like to have these procedure files updated.

Skip: renumber.spi

Assign global particle number for each particle.

numberparticles.spi
PARAMETER: maximum particles per micrograph (it is probably safe to err on the side of too many, or 0 will keep all of them)
INPUT: win/sel_particle_**** (or good/good**** if you selected particles from micrograph stacks)
OUTPUT: coords/ngood**** coords/mic2global

Return to top

Alignment -- in Alignment/ directory

sel_by_group.spi

INPUT: ../batch/Particles/good/ngood{****[mic]}
OUTPUT: sel_particles_***, sel_group

win2stk.spi

INPUT: ../batch/Particles/win/winser_***@, ../batch/Particles/good/ngood{****[mic]}
OUTPUT: data***

refproj.spi

INPUT: reference, ../Power_Spectra/order_defgrps
OUTPUT: refangles, prj_####@****

Align particles to reference projections

Alignment is CPU-intensive and benefits by parallelization. Whether you are running the alignment in parallel or in series, edit (but do not run) apsh-settings.spi.
- PARAMETERS:
- INPUT: projs/prj_***, data***
- OUTPUT: align_01_***, dala01_***
The following options will depend on whether using you are using Publish and Subscribe on a cluster:
1. Run apsh-inseries.spi.
The following is a brief tree of procedure calls:
apsh-inseries apsh-settings apsh-main

Return to top

Verify Particles -- in Reconstruction/ directory

Make selection doc for each reference view.

selectbyviewall.spi
SUBROUTINE: reversedoc_7col.spi
INPUT: align_01_***, ../batch/Alignment/sel_particles_***
OUTPUT: stack2particle***, select/sel***

Filter and (optionally) shrink particles.

filterbyview.spi
INPUT: dala01_***, align_01_***
OUTPUT: select/prj***/stkfilt
The goal with the filter parameters was to be able to ignore CTF effects. So, I chose the first CTF zero of the most-defocused micrograph as the Butterworth stop-band. If you're not sure what filter radii to use, try findctfminima.spi
To test the filter parameters, try running this procedure file on just the particles in one reference view, by setting parameter [last-view] to 1.
If you're using old, X-Window WEB, which can't montage a doc file from stacks (NOTE: it can now), run filtershrinksh.spi, which instead writes ../batch/Particles/flt/flt******

(Optional) Screen particles without classification.

You can optionally screen the particles without classification. This could be useful for small data sets that don't warrant classification, or if you otherwise want to see all particles before classification. Run:
```
 montagefromdoc 
```
The first popup window will hopefully have reasonable settings. If not, enter select/sortsel001.dat, select/prj001/stkfilt.dat, and select/prj001/goodsel.dat for the input selection filename, particle filename, and output selection file, respectively.
If you perform this step, skip ahead past combinegoodclasses.spi
The particles are sorted by correlation coefficient, from highest to lowest. You can display the CCROT values by clicking the checkbutton under Display/Labels, but you'll probably need to resize the window.

Run correspondence analysis and separate particles into classes.

classify.spi
INPUT: select/sel***, select/prj***/stkfilt
OUTPUT: select/prj***/{docclass###, classavg###, listclasses}
If you're running X-Windows WEB (or for some other reason ran filtershrinksh.spi) instead run unstacked-classify.spi, which uses ../batch/Particles/flt/flt****** as an input.
You can select good class-averages for a reference-view as soon as the procedure file starts on a subsequent reference-view (as printed to the screen). As of 2004, you can probably sift through classes faster than SPIDER can calculate them.
During MSA, SPIDER may hang at a particular reference view. If this happens, switch to a different MSA method (e.g., from correspondence analysis to iterative PCA) and restart at that reference view.

If you would like to verify particles on a different computer, you should be able to copy the contents of the Reconstruction/select/ directory. There are a couple thousand files there total, including the particle stacks, in contrast to ~100,000 files in the case of unstacked particles.
There are four options for keeping particles, depending on how much control you want/need:

Option using Python/Tkinter:
Select classes and particles therein using verifybyview.py. More information found here.
Options using WEB:

First, select good class-averages in WEB using Categorize/Sequential montage (to show them in sequential order) or Categorize/Doc. file montage (using listclasses) to show them from worst cross-correlation to best.
Name the resulting list of good classes goodclasses and click on the good classes. WEB by default will write a separate document file for each reference view.
Check member particles -- in separate WEB window -- using Montage from doc file. using the appropriate document file docclass{***class#}.
The Image file template in WEB should be "../batch/Particles/flt******"
Particles will be sorted from worst cross-correlation to best.
See here for an illustrated example.
There are three levels of control in verifying particles using WEB:

Simply click on the good classes as described above.
This will keep all of the particles in the good classes.
Manually click on the first good particle in each class.
I recommend doing this until you get a feel for which classes are bad.
See here for an illustrated example.

Click on the good classes as described above.
Instead of displaying class-montages with Montage from doc file, use Categorize/Doc. file montage.
Name the output file firstgoodparticle. There should be one file for each reference-view.
When prompted for the key, enter the class-number.
If there isn't a key for each good class, the next procedure file (combinegoodclasses.spi) will crash.
Click on the first good particle for each class that has particles you want to keep, e.g., the first one to keep all of them.

Manually click on the good particles in each class.
The only advantages over fully manual-particle verification is that the particles are separated by view and aligned. This is useful if classification didn't do a good job on your particles.

Click on the good classes as described above.
Display the class-montages using Categorize/Doc. file montage.
Name the output file byhand{***class#}.
Click on each particle that you would like to keep.

If using WEB, for a given reference-view, you have to use the same method for all classes.
That is, if you use the "whole class" mode (option 1) for one class in a reference-view, you must use it for all classes.
The same goes for options 2 and 3. However, you can mix-and-match methods within different reference-views.
The next procedure file will write to the screen which method was used for each view.

Combine particles from good classes.

combinegoodclasses.spi
SUBROUTINE: reversedoc_7col.spi
INPUT: select/prj***/{docclass###, goodclasses,select/sel***, firstgoodparticle (optional), byhand (optional)}
OUTPUT: select/prj***/goodsel (one for each reference-view), select/combinestats

(Optional) Re-screen the particles by view.

Screen particles using .montagefromdoc

In select/, I included a .montagefromdoc file with hopefully reasonable settings. If not, type:
```
montagefromdoc prj001/goodsel.dat prj001/stkfilt.dat
```
For the output filename, use prj001/notgood.
To salvage bad particles, do the converse, i.e., use prj001/badsel.dat and prj001/notbad.dat as the input and output.

recheck.spi

SUBROUTINE: reversedoc_7col.spi
INPUT: select/prj***/sortsel, select/prj***/goodsel, select/prj***/badsel select/prj***/notbad, select/prj***/notgood
OUTPUT: select/prj***/goodselB, select/prj***/badselB
For the output doc files, I used a letter to distinguish from prior output. A number might cause problems for montagefromdoc.py.

(Optional) Average images by view

viewaverage.spi
INPUT: select/prj***/goodsel
If you ran recheck.spi, add the tiebreaker (e.g., B) to the filename.
OUTPUT: select/prj***/goodavg,select/prj***/goodvar (optional)
One advantage of this procedure file over average.spi is that it combines all defocus groups.
You can use verifybyview.py to link the averages to the retained particles. To do so, run verifybyview from Reconstruction/select/ (so as to not overwrite the settings in Reconstruction/). The default settings in .verifybyview will hopefully be reasonable. In principle, you could re-screen your particles if there are still bad ones.

Compute CCC histogram of particles

histgoodccc.spi
INPUT: select/prj***/goodsel, select/prj***/badsel
If you ran recheck.spi, add the tiebreaker (e.g., B) to the filename.
OUTPUT: combinedgood, histcccgood, histcccbad
Check histogram using fit.gnu, modifying file extension and fit, if needed. Type
```
gnuplot 
```
and then at the prompt, type (including the single quotes):
```
load 'fit.gnu'
```
Normally, the histogram should look Gaussian. If there is a tail or a second mode at the low-CCC end, there may be non-particles remaining. You can filter them out by using a fractional cutoff in a later step, or you can go back and more stringently go through the particles.

Separate total good-particle list by defocus-group.

goodparticlesbydf.spi
PARAMETER: fractional cutoff (optional, use 0.0 to keep all)
INPUTS: combinedgood, stack2particle***
OUTPUT: df***/goodparticles, sel_group_cclim, sel_group_cclim_sorted
For large particle sets, this procedure file may crash, in which case try big-goodparticlesbydf.spi

Generate alignment documents with only good particles

dfgoodapsh.spi
INPUT: ../batch/Alignment/align_01_***, df***/goodparticles
OUTPUT: ../batch/Alignment/good-align_01_***, sel_particles***
The sel_particles*** files are essentially equivalent to df***/goodparticles, lacking only the CCROT column.

Return to top

Compute averages -- also in Reconstruction/ directory

select.spi (optional, necessary for display.spi or plotrefviews.spi) -- for each defocus-group, separates particles by reference-view

INPUT: Use good-align_01_*** instead of align_01_***
OUTPUT: df***/select/sel###, df***/how_many, how_many

Skip average.spi -- The equivalent was performed above by viewaverage.spi
Skip cchistogram.spi -- The equivalent was performed above by histgoodccc.spi
Skip ccthresh.spi -- The equivalent was performed above by histgoodccc.spi
Skip dftotals.spi -- The equivalent was performed above by goodparticlesbydf.spi
plotrefviews.spi, display.spi (optional) -- graphically show distribution of views

INPUT: df***/how_many
OUTPUT: gnuplot_view, display/cndis***

bestim.spi (optional) -- truncate seltotal files for overrepresented views

Return to top

3D reconstruction -- still in Reconstruction/ directory

Compute 3D reconstruction
1. Run bps-inseries.spi.
Like alignment, 3D reconstruction is CPU-intensive and benefits by parallelization. Whether you are running the alignment in parallel or in series, edit (but do not run) bps-settings.spi.
PARAMETERS:

[bp-method] -- backprojection method: BP CG, BP 32F, or BP RP
[stk-opt] -- Stack options (e.g., 0 to read from original location, 1 to copy into RAM, or 2 to copy temporarily to disk).
[max-wait] -- Will wait up to 10 minutes by default before reading a new group's stack, in order to distribute disk I/O.

INPUT: dala01_***, ../batch/Alignment/align_01_***, sel_particles_***
OUTPUT: df***/{vol01_odd, vol01_even}, df***/docfscmasked, summary-bps

The following is a brief tree of procedure calls:

bps-inseries bps-settings bps-main bps-calcres bps-combine bps-calcres

slices.spi (optional)

INPUT: df***/vol01_odd
OUTPUT: slices/slice***

plotres.spi

INPUT: df***/docfscmasked (change from df***/fscdoc), docfscmasked (change from combires), resolution
OUTPUT: gnuplot_res

Generate a matched-filter profile

make_matched.spi
INPUT: vol01_odd, vol01_even
OUTPUT: docmatched_vol01
Information at spatial frequencies above the cutoff value is considered to be uncorrelated noise, and should be filtered out. The resolution curve itself is used to determine the shape of the filter. This concept is described in Huang and Penczek (2004) JSB 145: 29-40.

Apply the filter profile

matchedfilter.spi
INPUT: docmatched_vol01, vol01
OUTPUT: vol01_matched
Applying the same filter profile to different reconstructions can be used to filter them to the same resolution (provided the filter for the lowest-resolution reconstruction is used).

Prepare files for refinement

consecprepare.spi
INPUT: sel_group_cclim, sel_particles_***, data***, dala01_***
OUTPUT: good-sel_group_cclim, good-sel_particles_***, good-data***, good-dala01_*** (optional, see NOTE below)
NOTE: In the developmental version of the main, projection-matching procedure files, the aligned-image stacks (dala01_***) are no longer needed. The default is currently to write the aligned-images stacks, but to save disk space, this output can be skipped by setting the parameter [dala-yn] to 0.
Originally, this procedure file copied the outputs into Refinement/input. Now however, the subroutine prepare.pam in Refinement does this for you. Left as is, refinement would manipulate the entire stacks, including discarded particles. This could be wasteful, especially if particles sets are further split using classification, for example. So, with this procedure file, only the desired particles will be retained.
However, below, you will need to remember to change the input filenames in refine_settings.pam

Skip filt.spi -- The equivalent was performed above by make_matched.spi and matchedfilter.spi

Return to top

Refinement -- in Refinement/ directory

Refinement documentation
Use the following as inputs in refine_settings.pam, as written above by consecprepare.spi
(Note the only change from the default is the prefix good- for these files) :

good-sel_group_cclim
good-sel_particles_***
good-data***
good-dala01***
good-align_01_***

I recommend using the filtered volume, vol01_matched, instead of vol01, in order to minimize model bias.

Return to top

Additional procedure files

I have some miscellaneous procedure files here.

Recent modifications:

2012-07-19 --the following procedure files explicitly create necessary output directories:

eman2spider.spi
rewindow.spi
numberparticles.spi

2012-05-23 -- changed X## format registers to named registers for the following procedure files:

selectbyviewall.spi
rewindow.spi
recheck.spi
goodparticlesbydf.spi
eman2spider.spi
combinegoodclasses.spi
classify.spi

2012-05-17 -- bps-pubsub.spi and bps-inseries.spi -- replace bps-by-df.spi
2012-05-15 -- filtering reconstruction using matched filter
2012-05-15 -- apsh-pubsub.spi and apsh-inseries.spi -- replace apshgrp.spi
2012-05-01 -- numberparticles.spi -- replaces pnums.spi and listallparticles.spi
2012-04-12 -- filterbyview.spi -- added option to not filter
2012-04-06 -- eman2spider.spi -- added sel_particles*** output
2012-03-15 -- combinegoodclasses.spi -- summary doc file now has the format of old how_many files
2011-01-14 -- bps-by-df.spi -- computes masked and unmasked FSCs
2011-01-11 -- regroup.spi -- can quickly test sizes of new defocus groupings
2011-01-07 -- consecprepare.spi -- new refinement procedure files use different and fewer inputs
2010-09-02 -- emancoords2spiderdoc.py -- SPIDER and EMAN2 coordinate systems appear to be the same now
2009-07-10 -- documented various doc file formats
2009-06-03 -- changed format of Reconstruction/select/sel*** files, now same format as Reconstruction/select/prj***/sortsel, for versatility and generality. The following procedure files are affected:

selectbyviewall.spi
filterbyview.spi
classify.spi
combinedgoodclasses.spi
reselect_byview.spi (in VerifyMisc/)

2009-05-14 -- histgoodccc.spi -- added output histogram of bad particles
2009-05-14 -- recheck.spi -- added optional step to add or subtract particles after re-screening
2009-04-14 -- dfgoodapsh.spi -- output now called sel_particles_***, can work without defocus groups
2009-04-07 -- included backup script backup.sh
2009-02-26 -- added quick-start guide
2009-02-26 -- added viewaverage.spi (optional) -- analogous to average.spi
2008-09-29 -- added mkfilenums.py, shrink.spi, and .montagefromdoc in Micrographs/
2008-07-02 -- added to SPIDER Techniques page
2008-02-01 -- selectbyviewall.spi -- added output stack2particle*** with global particle number, to compensate for a change to sel_particles. The following also changed accordingly:

histgoodccc.spi
big-goodparticlesbydf.spi
goodparticlesbydf.spi
dfgoodapsh.spi

2007-08-22 -- archive of tarballs linked to projection-matching tarballs
2007-08-22 -- changed hard-to-read italic passages to less-hard-to-read teal.
2007-01-31 -- significant re-write of procedure files in Verification section.
Changes are summarized here.
2007-01-23 -- started archive of tarballs
2006-08-29 -- verifybyview.py -- stores prior settings in text file .verifybyview
2006-07-27 -- added SPIRE configuration file to tarball
2006-06-27 -- created tarball of procedure files
2006-05-12 -- AP SH version is now the default
2005-03-03 -- posted documentation on Python/Tkinter interface
2005-01-21 -- changed extensions from .bat to .spi

Return to top

Particle verification for single-particle, reference-based reconstruction using multivariate data analysis and classification As described in J Struct Biol (2008) 161 : 41-48. Out of date, legacy protocol, with dead links?

Outline

Particle verification for single-particle, reference-based reconstruction using multivariate data analysis and classification
As described in J Struct Biol (2008) 161 : 41-48.
Out of date, legacy protocol, with dead links?