(Updated 2013 Jan 21)
Particle verification for single-particle,
reference-based reconstruction
using multivariate data analysis and classification
As described in J Struct Biol (2008)
161 : 41-48.
Out of date, legacy protocol, with dead links?
Outline
Links:
Return to top
General notes:
- Changes from the "normal" flow will be shown in teal.
- Data filenames will be in bold, and menu options will be in italic.
- Data extension is assumed to be .dat. Adjust accordingly.
- Console commands will be in Courier.
Return to top
Quick-start guide
Options limited for the sake of simplicity. For more details, see below.
In toplevel directory (e.g., myproject/):
- Unpackage
projection-matching procedure files
- Unpackage verification procedure files
- Copy params file or Run: spider spi/dat @makeparams
- Copy reference volume and/or Run: spider spi/dat @resizevol
In Micrographs/:
- mkfilenums ../filenums.dat mic*.dat
- spider spi/dat @shrink
- montagefromdoc ../sel_micrograph.dat sm-mic*
In Power_Spectra/:
- spider spi/dat @powdefocus
- montagefromdoc ../sel_micrograph.dat Power_Spectra/pw_avg*
- ctfmatch Power_Spectra/ctf* &
- spider spi/dat @defavg
In Particles/:
- Copy noise file, or spider spi/dat @noise
- e2boxer.py
- spider spi/dat @eman2spider
- spider spi/dat @rewindow
- spider spi/dat @numberparticles
In Alignment/:
- spider spi/dat @sel_by_group
- spider spi/dat @win2stk
- spider spi/dat @refproj
- ./spider spi/dat @apsh-pubsub &
In Reconstruction/:
- spider spi/dat @selectbyviewall
- spider spi/dat @filterbyview
- spider spi/dat @classify
- verifybyview
- spider spi/dat @combinegoodclasses
- (optional) spider spi/dat @recheck
- (optional) spider spi/dat @viewaverage
- spider spi/dat @histgoodccc
- spider spi/dat @goodparticlesbydf
- spider spi/dat @dfgoodapsh
- (optional) spider spi/dat @select
- (optional) spider spi/dat @plotrefviews
- (optional) spider spi/dat @display
- (optional) spider spi/dat @bestim
- ./spider spi/dat bps-pubsub &
- (optional) spider spi/dat @slices
- spider spi/dat @plotres
- spider spi/dat @make_matched
- spider spi/dat @matchedfilter
- spider spi/dat @consecprepare
In Refinement/:
- Edit refine_settings
- ./spider pam/dat @pub_refine &
- spider spi/dat @plotrefres
Return to top
Getting started
spire &
Type something, anything, under Project title.
Enter data extension under data extension.
Here, I will assume .dat, so adjust accordingly.
Under Directory for this project,
make sure that the current directory is entered.
Sometimes the data extension is appended, thus defining a new directory.
Under Configuration file, select verify.xml,
using the Browse button if necessary.
Uncheck the button Create directories and load procedure files.
makeparams.spi
Skip makefilelist.spi --
The Python script
mkfilenums.py
is more general.
Get reference volume
(optional) resizevol.spi -- interpolates reference-volume
Return to top
Micrographs -- in Micrographs/ directory
- Micrographs are assumed to have the file pattern mic****
- To generate a SPIDER doc file containing a list of micrographs, type
(substitituing the appropriate data extension):
mkfilenums ../filenums.dat mic*.dat
Shrink the micrographs
- shrink.spi
- PARAMETER: decimation factor (so that
the micrograph fits on the screen at 1X)
- INPUT: mic****
- OUTPUT: sm-mic****
(Optional) Screen the micrographs using
montagefromdoc.py.
montagefromdoc
The first popup window will contain hopefully reasonable settings, or
you can enter filenums and the micrograph file-pattern on the command line
(in that order).
Alternatively, to keep all the micrographs, in the
top-level directory, copy filenums to sel_micrograph.
Return to top
CTF estimation -- in Power_Spectra/directory
- powdefocus.spi (slow)
- INPUT: Micrographs/mic****
- OUTPUT: Power_Spectra/pw_avg***, Power_Spectra/roo***
- Screen the power spectra visually using
montagefromdoc.py..
- Run:
montagefromdoc
The first popup window will contain hopefully reasonable settings.
If not, the input doc file is sel_micrograph and
the image file pattern is Power_Spectra/pw_avg****.
- To enhance the contrast more than is possible in
montagefromdoc.py, you may need to view a power spectrum in JWEB.
- The output selection doc file, sel_micrograph,
will overwrite the original, however a backup copy will be saved.
- Screen the determined defocus values.
- ctfmatch.py
is a nice program to display the fitting.
If removing bad micrographs, name the list of remaining good micrographs
sel_micrograph.
Return to top
Particle-picking -- in Particles/
directory
- Get noise file, either from previous project or from noise.spi
- Window particles from micrographs (slow).
- pick.spi
- pick.spi calls pick_p.spi.
Selecting particles from micrographs in WEB is another option.
- INPUT: ../batch/Micrographs/mic****
(change from raw****), ../reference_volume, noise
- OUTPUT: win/winser_****, coords/sndc****
- lfc_pick.spi
- lfc_pick.spi calls pickparticle.spi.
- INPUT: ../batch/Micrographs/mic**** (change from raw****, ../reference_volume, noise
- OUTPUT: win/winser_****, coords/sndc****
- boxer/e2boxer.py in EMAN/EMAN2
- Then, window out the images using
rewindow.spi
- INPUT: ../batch/Micrographs/mic****, coords/sndc****
- OUTPUT: win/winser_****
- This is also a useful procedure file if I want to use a different window size,
or if I've deleted the winser files to free up disk space.
(Optional) Select particles from micrograph stacks.
lfc_pick.spi sorts particles by high cross-correlation to
worst, and will typically lead off each micrograph with
ice-condensation blobs and end with noise. You can exclude these bad images at the
extremes by specifying the contiguous range that includes all of the good particles.
This step will save time and disk space and will help the classification.
There are two ways to visualize the particles: using WEB or the Python utility
montagefromdoc.py
- Using montagefromdoc.py --
this will require less preprocessing
(e.g., filtershrink.spi) but Python display is slower than WEB.
- Run:
montagefromdoc
The first popup window will contain hopefully reasonable settings,
such as input coords/sndc**** and output good/good****.
Alternatively, you can enter the doc-file and particle-file names
on the command line (in that order), e.g.,
montagefromdoc coords/sndc1234 win/winser_1234
- Using WEB
- Under Options/Image turn on filenames, and
you may need to use a small font, set under Options/Font.
Open the stacks win/winser_****.
- There used to be procedure files
filtershrink.spi and
negmontagedocs.spi
to filter and break the stacks into a bite-sized number of images,
respectively, but
I haven't updated them since lfc_pick started using stacks. Contact
me if you would like
to have these procedure files updated.
Skip: renumber.spi
Assign global particle number for each particle.
- numberparticles.spi
- PARAMETER: maximum particles per micrograph
(it is probably safe to err on the side of too many, or 0 will keep all of them)
- INPUT: win/sel_particle_****
(or good/good**** if you selected particles from micrograph stacks)
- OUTPUT: coords/ngood**** coords/mic2global
Return to top
Alignment -- in Alignment/ directory
Return to top
Verify Particles --
in Reconstruction/ directory
- Make selection doc for each reference view.
- Filter and (optionally) shrink particles.
- filterbyview.spi
- INPUT: dala01_***, align_01_***
- OUTPUT: select/prj***/stkfilt
- The goal with the filter parameters was to be able to ignore CTF effects.
So, I chose the first CTF zero of the most-defocused micrograph as the Butterworth stop-band.
If you're not sure what filter radii to use, try
findctfminima.spi
- To test the filter parameters,
try running this procedure file on just the particles in one reference view,
by setting parameter [last-view] to 1.
- If you're using old, X-Window WEB, which
can't montage a doc file from stacks (NOTE: it can now), run
filtershrinksh.spi,
which instead writes ../batch/Particles/flt/flt******
- (Optional) Screen particles without classification.
- You can optionally screen the particles without classification.
This could be useful for small data sets that don't warrant classification,
or if you otherwise want to see all particles before classification. Run:
montagefromdoc
The first popup window will hopefully have reasonable settings.
If not, enter select/sortsel001.dat, select/prj001/stkfilt.dat, and
select/prj001/goodsel.dat for the input selection filename, particle filename,
and output selection file, respectively.
- If you perform this step, skip ahead past combinegoodclasses.spi
- The particles are sorted by correlation coefficient, from highest to lowest.
You can display the CCROT values by clicking the checkbutton under Display/Labels,
but you'll probably need to resize the window.
- Run correspondence analysis and separate particles into classes.
- classify.spi
- INPUT: select/sel***, select/prj***/stkfilt
- OUTPUT: select/prj***/{docclass###, classavg###,
listclasses}
- If you're running X-Windows WEB (or for some other reason ran
filtershrinksh.spi) instead run
unstacked-classify.spi,
which uses ../batch/Particles/flt/flt****** as an input.
- You can select good class-averages for a reference-view as
soon as the procedure file starts on a subsequent reference-view (as
printed to the screen). As of 2004, you can probably sift through
classes faster than SPIDER can calculate them.
- During MSA, SPIDER may hang at a particular reference view.
If this happens, switch to a different MSA method (e.g., from correspondence
analysis to iterative PCA) and restart at that reference view.
- If you would like to verify particles on a different computer,
you should be able to copy the contents of the Reconstruction/select/ directory.
There are a couple thousand files there total, including the particle stacks,
in contrast to ~100,000 files in the case of unstacked particles.
- There are four options for keeping particles,
depending on how much control you want/need:
- Option using Python/Tkinter:
Select classes and particles therein using verifybyview.py.
More information found
here.
- Options using WEB:
- First, select good class-averages in WEB using Categorize/Sequential
montage (to show them in sequential order) or Categorize/Doc.
file montage (using listclasses) to show them
from worst cross-correlation to best.
- Name the resulting list of good classes goodclasses and
click on the good classes. WEB by default will write a separate
document file for each reference view.
- Check member particles -- in
separate WEB window -- using Montage from doc file. using
the appropriate document file docclass{***class#}.
The Image file template in WEB should be
"../batch/Particles/flt******"
Particles will be sorted from worst cross-correlation to best.
See here for an illustrated example.
- There are three levels of control in verifying particles using WEB:
- Simply click on the good classes as described above.
This will keep all of the particles in the good classes.
- Manually click on the first good particle in each class.
I recommend doing this until you get a feel for which classes are bad.
See here for an illustrated example.
- Click on the good classes as described above.
- Instead of displaying class-montages with Montage
from doc file, use Categorize/Doc. file montage.
- Name the output file firstgoodparticle. There
should be one file for each reference-view.
- When prompted for the key, enter the class-number.
If there isn't a key for each good class, the next procedure
file (combinegoodclasses.spi) will crash.
- Click on the first good particle for each class that has
particles you want to keep, e.g., the first one to keep all of them.
- Manually click on the good particles in each class.
The only advantages over fully manual-particle verification is
that the particles are separated by view and aligned.
This is useful if classification didn't do a good job on your particles.
- Click on the good classes as described above.
- Display the class-montages using Categorize/Doc. file montage.
- Name the output file byhand{***class#}.
- Click on each particle that you would like to keep.
If using WEB, for a given reference-view, you have to use the same method for all
classes.
That is, if you use the "whole class" mode (option 1) for one class in a
reference-view, you must use it for all classes.
The same goes for options 2 and 3.
However, you can mix-and-match methods within different reference-views.
The next procedure file will write to the screen which method was used for each view.
- Combine particles from good classes.
- combinegoodclasses.spi
- SUBROUTINE:
reversedoc_7col.spi
- INPUT: select/prj***/{docclass###, goodclasses,select/sel***,
firstgoodparticle (optional), byhand (optional)}
- OUTPUT: select/prj***/goodsel (one for each reference-view),
select/combinestats
- (Optional) Re-screen the particles by view.
- Screen particles using .montagefromdoc
- In select/, I included a .montagefromdoc file with hopefully reasonable settings.
If not, type:
montagefromdoc prj001/goodsel.dat prj001/stkfilt.dat
For the output filename, use prj001/notgood.
- To salvage bad particles, do the converse, i.e.,
use prj001/badsel.dat and prj001/notbad.dat as the input and output.
- recheck.spi
- SUBROUTINE:
reversedoc_7col.spi
- INPUT: select/prj***/sortsel, select/prj***/goodsel,
select/prj***/badsel select/prj***/notbad, select/prj***/notgood
- OUTPUT: select/prj***/goodselB, select/prj***/badselB
- For the output doc files, I used a letter to distinguish from prior output.
A number might cause problems for montagefromdoc.py.
- (Optional) Average images by view
- viewaverage.spi
- INPUT: select/prj***/goodsel
If you ran recheck.spi, add the tiebreaker (e.g., B) to the filename.
- OUTPUT: select/prj***/goodavg,select/prj***/goodvar (optional)
- One advantage of this procedure file over average.spi is that it combines all defocus groups.
- You can use verifybyview.py to link the averages to the retained particles.
To do so, run verifybyview from Reconstruction/select/
(so as to not overwrite the settings in Reconstruction/).
The default settings in .verifybyview will hopefully be reasonable.
In principle, you could re-screen your particles if there are still bad ones.
- Compute CCC histogram of particles
- histgoodccc.spi
- INPUT: select/prj***/goodsel, select/prj***/badsel
If you ran recheck.spi, add the tiebreaker (e.g., B) to the filename.
- OUTPUT: combinedgood, histcccgood, histcccbad
- Check histogram using fit.gnu,
modifying file extension and fit, if needed. Type
gnuplot
and then at the prompt, type (including the single quotes):
load 'fit.gnu'
Normally, the histogram should look Gaussian. If there is a tail or a second mode
at the low-CCC end, there may be non-particles remaining. You can filter them out
by using a fractional cutoff in a later step, or you can go back and more stringently
go through the particles.
- Separate total good-particle list by defocus-group.
- goodparticlesbydf.spi
- PARAMETER: fractional cutoff (optional, use 0.0 to keep all)
- INPUTS: combinedgood, stack2particle***
- OUTPUT: df***/goodparticles, sel_group_cclim, sel_group_cclim_sorted
- For large particle sets, this procedure file may crash, in which case try
big-goodparticlesbydf.spi
- Generate alignment documents with only good particles
- dfgoodapsh.spi
- INPUT: ../batch/Alignment/align_01_***, df***/goodparticles
- OUTPUT: ../batch/Alignment/good-align_01_***, sel_particles***
- The sel_particles*** files are essentially equivalent to
df***/goodparticles, lacking only the CCROT column.
Return to top
Compute averages --
also in Reconstruction/ directory
- select.spi (optional, necessary for display.spi or plotrefviews.spi)
-- for each defocus-group, separates particles by reference-view
- INPUT: Use good-align_01_*** instead of align_01_***
- OUTPUT: df***/select/sel###, df***/how_many, how_many
- Skip average.spi -- The equivalent was performed above by viewaverage.spi
- Skip cchistogram.spi -- The equivalent was performed above by histgoodccc.spi
- Skip ccthresh.spi -- The equivalent was performed above by histgoodccc.spi
- Skip dftotals.spi -- The equivalent was performed above by goodparticlesbydf.spi
- plotrefviews.spi, display.spi (optional) -- graphically show distribution of views
- INPUT: df***/how_many
- OUTPUT: gnuplot_view, display/cndis***
- bestim.spi (optional) -- truncate seltotal files for overrepresented views
Return to top
3D reconstruction --
still in Reconstruction/ directory
- Compute 3D reconstruction
- Run
bps-inseries.spi.
Like alignment, 3D reconstruction is CPU-intensive and benefits by parallelization.
Whether you are running the alignment in parallel or in series, edit (but do not run)
bps-settings.spi.
- PARAMETERS:
- [bp-method] -- backprojection method: BP CG, BP 32F, or BP RP
- [stk-opt] -- Stack options (e.g., 0 to read from original location,
1 to copy into RAM, or 2 to copy temporarily to disk).
- [max-wait] -- Will wait up to 10 minutes by default before reading a new group's stack,
in order to distribute disk I/O.
- INPUT: dala01_***, ../batch/Alignment/align_01_***,
sel_particles_***
- OUTPUT: df***/{vol01_odd, vol01_even}, df***/docfscmasked, summary-bps
The following is a brief tree of procedure calls:
- slices.spi (optional)
- INPUT: df***/vol01_odd
- OUTPUT: slices/slice***
- plotres.spi
- INPUT: df***/docfscmasked (change from df***/fscdoc),
docfscmasked (change from combires), resolution
- OUTPUT: gnuplot_res
- Generate a matched-filter profile
- make_matched.spi
- INPUT: vol01_odd, vol01_even
- OUTPUT: docmatched_vol01
- Information at spatial frequencies above the cutoff value is
considered to be uncorrelated noise, and should be filtered out.
The resolution curve itself is used to determine the shape of the filter.
This concept is described in Huang and Penczek (2004) JSB 145: 29-40.
- Apply the filter profile
- matchedfilter.spi
- INPUT: docmatched_vol01, vol01
- OUTPUT: vol01_matched
- Applying the same filter profile to different reconstructions
can be used to filter them to the same resolution
(provided the filter for the lowest-resolution reconstruction is used).
- Prepare files for refinement
- consecprepare.spi
- INPUT: sel_group_cclim, sel_particles_***,
data***, dala01_***
- OUTPUT: good-sel_group_cclim, good-sel_particles_***,
good-data***, good-dala01_*** (optional, see NOTE below)
- NOTE: In the developmental version of the
main, projection-matching procedure files, the aligned-image stacks
(dala01_***) are no longer needed.
The default is currently to write the aligned-images stacks, but
to save disk space, this output can be skipped by
setting the parameter [dala-yn] to 0.
- Originally, this procedure file copied the outputs into Refinement/input.
Now however, the subroutine
prepare.pam in
Refinement does this for you.
Left as is, refinement would manipulate the entire stacks, including discarded particles.
This could be wasteful, especially if particles sets are further split using classification, for example.
So, with this procedure file, only the desired particles will be retained.
However, below, you will need to remember to change the input filenames in
refine_settings.pam
- Skip filt.spi -- The equivalent was performed above by make_matched.spi and matchedfilter.spi
Return to top
Refinement -- in Refinement/
directory
- Refinement documentation
- Use the following as inputs in
refine_settings.pam,
as written above by
consecprepare.spi
(Note the only change from the default is the prefix
good- for these files) :
- good-sel_group_cclim
- good-sel_particles_***
- good-data***
- good-dala01***
- good-align_01_***
- I recommend using the filtered volume, vol01_matched,
instead of vol01, in order to minimize model bias.
Return to top
Additional procedure files
I have some miscellaneous procedure files here.
Recent modifications:
- 2012-07-19 --the following procedure files explicitly create necessary output directories:
- eman2spider.spi
- rewindow.spi
- numberparticles.spi
- 2012-05-23 -- changed X## format registers to named registers for the following procedure files:
- selectbyviewall.spi
- rewindow.spi
- recheck.spi
- goodparticlesbydf.spi
- eman2spider.spi
- combinegoodclasses.spi
- classify.spi
- 2012-05-17 -- bps-pubsub.spi and bps-inseries.spi -- replace bps-by-df.spi
- 2012-05-15 -- filtering reconstruction using matched filter
- 2012-05-15 -- apsh-pubsub.spi and apsh-inseries.spi -- replace apshgrp.spi
- 2012-05-01 -- numberparticles.spi -- replaces pnums.spi and listallparticles.spi
- 2012-04-12 -- filterbyview.spi -- added option to not filter
- 2012-04-06 -- eman2spider.spi -- added sel_particles*** output
- 2012-03-15 -- combinegoodclasses.spi -- summary doc file now has the format of old how_many files
- 2011-01-14 -- bps-by-df.spi -- computes masked and unmasked FSCs
- 2011-01-11 -- regroup.spi -- can quickly test sizes of new defocus groupings
- 2011-01-07 -- consecprepare.spi -- new refinement procedure files use different and fewer inputs
- 2010-09-02 -- emancoords2spiderdoc.py -- SPIDER and EMAN2 coordinate systems appear to be the same now
- 2009-07-10 -- documented various doc file formats
- 2009-06-03 -- changed format of Reconstruction/select/sel*** files,
now same format as Reconstruction/select/prj***/sortsel, for versatility and generality.
The following procedure files are affected:
- selectbyviewall.spi
- filterbyview.spi
- classify.spi
- combinedgoodclasses.spi
- reselect_byview.spi (in VerifyMisc/)
- 2009-05-14 -- histgoodccc.spi -- added output histogram of bad particles
- 2009-05-14 -- recheck.spi -- added optional step to add or subtract particles after re-screening
- 2009-04-14 -- dfgoodapsh.spi -- output now called sel_particles_***, can work without defocus groups
- 2009-04-07 -- included backup script backup.sh
- 2009-02-26 -- added quick-start guide
- 2009-02-26 -- added viewaverage.spi (optional) -- analogous to average.spi
- 2008-09-29 -- added mkfilenums.py, shrink.spi, and .montagefromdoc
in Micrographs/
- 2008-07-02 -- added to
SPIDER Techniques page
- 2008-02-01 -- selectbyviewall.spi -- added output stack2particle***
with global particle number,
to compensate for a change to sel_particles. The following also
changed accordingly:
- histgoodccc.spi
- big-goodparticlesbydf.spi
- goodparticlesbydf.spi
- dfgoodapsh.spi
- 2007-08-22 -- archive of tarballs linked to projection-matching tarballs
- 2007-08-22 -- changed hard-to-read italic passages to less-hard-to-read teal.
- 2007-01-31 -- significant re-write of procedure files in Verification section.
Changes are summarized here.
- 2007-01-23 -- started archive of tarballs
- 2006-08-29 -- verifybyview.py -- stores prior settings in text file .verifybyview
- 2006-07-27 -- added SPIRE configuration file to
tarball
- 2006-06-27 -- created tarball of procedure files
- 2006-05-12 -- AP SH version is now the default
- 2005-03-03 -- posted documentation on Python/Tkinter interface
- 2005-01-21 -- changed extensions from .bat to .spi
Return to top