SPIDER: Random Info

Occasional Thoughts about SPIDER, etc.

20 January 2020 ArDean Leith

Support for MRC data files in SPIDER.

SPIDER can now load and create MRC format image and volume files. Just specify the full file name including the extension e.g. file.mrc or 'file.mrcs' anywhere that SPIDER requests a filename. For MRC stacked images you can use: 1@file.mr. For prompts that request a template use: *@file.mrc. Further information and notes on the drawbacks of using MRC stacks are given here.

1 March 2018 ArDean Leith

The Future of SPIDER Software.

It is now obvious that most of the advance in resolution of cryo-em reconstructions since 2013 came from the use of direct electron capture cameras and not from reconstruction software improvements. The report from the 'EM Databank Map Challenge', which I participated in, supports this claim.

However softwares like: Cryosparc, Relion, EMAN2, SPARC, Xmipp, Bsoft, and SIMPLE offer increased speed, convenient interfaces, and often database connectivity improvements not available in SPIDER. Due to age and design differences these types of improvements are often impossible to add to SPIDER.

SPIDER continues to be used for complete single particle reconstructions from microscope frame alignment thru refinement. It can yield reconstructions close in resolution to those from the above software even though it does not use 'maximum likelihood' methodology and some other advances. However it will be far slower than GPU accelerated Cryosparc or Relion. It also has a command line interface which some younger users find old-fashioned.

SPIDER contains software for reconstruction from tilt pair imagery, particle picking, classification capabilities, and a wide selection of general image processing operations which are not available in these other packages. These methods are valuable for tasks like creating particle masks and understanding particle heterogeniety. They are also valuable in teaching environments.

What is the future role of SPIDER? It is already obvious that funding is not available for upgrades and improvements. Does anyone out there care? Where do we go from here? Do you see any continued use of SPIDER in your laboratory.

15 Feb. 2018 ArDean Leith

GPU Usage

The feasibililty of using GPU's and CUDA for speeding up reconstruction has changed significantly since I last described our efforts. Recent generations of GPU's from Nvidia and AMD load data at a much faster speed and a much larger memory size is available. It is no longer necessary to create a suite of approaches to parallelize different data set sizes and shapes. Implementations in Relion and more so in Cryosparc (using a different approach to alignment search) illustrate the speed-up that is possible. We should revisit GPU use within SPIDER, but I won't.

21 April 2014 ArDean Leith

The Future of EM Software.

Both Science and C&E News have acknowledged the current 'revolutionary advance' in cryo-electron microscopy single-particle reconstruction.

These advances in resolution of reconstructions use new direct electron capture cameras and publications, that I have seen, utilize Relion software for the reconstruction.

I am still uncertain how much of the improved resolution arises from the improved software. At issue are not only the reconstruction methodology but also the resolution metric.

If Relion is a significant source of the improvement then there arises a question of the future role of other softwares in reconstruction. Currently Relion is able to handle most of the reconstruction pathway except for particle selection (windowing) and initial reference model construction.

These other softwares include: SPIDER, EMAN2, SPARX, Xmipp, IMAGIC, Bsoft, and SIMPLE and some others. These softwares still contain some capabilities not found in Relion. e.g.

SPIDER contains software for reconstruction from tilt pair imagery, particle picking, classification capabilities, and a wide selection of general image processing operations .
EMAN2 contains widely used particle picking modules.
SPARX contains new classification capabilities.
Xmipp has an alternative maximum likelihood alignment capability and a fairly wide selection of general image processing modules.
SIMPLE and Xmipp (recently announced) contain modules for creating unbiased intial reference volumes.
Xmipp, Bsoft, and SIMPLE use similar conventions and data metafiles to Relion and thus are quite compatible with Relion.

With the exception of these capabilities what is the future function of these softwares? Will they survive Relion's ascent? How much future development should be done on them? What will be the impact on funding for software other than Relion?

EM Software development funding by NIH in the US is currently in a rather bad state. Both SPIDER and IMOD and its associated software have lost major or all of their funding. At NIH almost all software development grants, for widely different purposes, compete directly and also compete with funding for various biological databases. This lack of targeting leads to poor quality reviewing.

E.g. In the case of SPIDER one of three reviewers of our most recent grant application stated:

"the number of investigators employing SPR is limited and not expected to grow substantially".

It is difficult for me to see how a knowledgeable reviewer could come to such a conclusion in the midst of a 'revolutionary advance'.

There does not appear to be any viable non-grant mechanism for the continued maintenance of scientific "Free Open Source Software". Is it reasonable to hope that researchers will direct voluntary monetary donations to software developers as some have suggested? Can researchers even get such a contribution approved by their local grant administrators? Do their auditors OK such an unobligated contribution? There are additional problems with currency conversion. Certainly the red tape involved in both donating and accepting a donation conspire against this idea. Up until now most software development has existed as sort of a side-operation of previously fairly well funded EM labs, in our case a 'NIH research resource'. Such funding is increasingly at risk and long-term development and maintenance of software is disappearing.

This uncertainty in funding confounds discussion on the future of EM software. Where do we go from here? Do you see continued use of SPIDER and other softwares?

29 Nov. 2012 ArDean Leith

Why no one should use MRC image stacks (IMO).

A single particle reconstruction from cryo-EM images of non-symmetrical objects often requires 100,000 --> 1,000,000 images. If such a large number of images are stored in most common Linux filesytems, accession / addition of images will cause thrashing of the filesytem and extemely slow access. This occurs not just in processes accessing the images but throughout all access to that file system.

To overcome this thrashing one can purchase an expensive parallel file storage system (e.g. from Panasas) or more commonly aggregate the images into 'stacks', or a less commonly into a database. Most EM softwares support some sort of file based stack. Several different EM single particle reconstruction softwares support both MRC and SPIDER format files to various extents.

The MRC stack file format is an especially poor choice for your stacks. There is a single 1024 byte header for the whole stack, then individual images are concatenated into the stack without any image specific header..

Problems

There is no indicator in the file whether it is a stack or a volume

There are no volume stacks.

There are no indexed stacks for efficient storage of sparsely numbered images.

Different softwares assume that the first image in the MRC stack is numbered either as image:0 or image:1. When converting MRC stacks to other stacks (e.g. SPIDER) the numbers no longer are consistent since they may always start with image:1.

The MRC format allows for different XYZ pixel/voxel ordering. An ordering in which stacked images (Z) preceedes the column or row value is a disaster if adding additional images to the stack. Every single pixel in the stack has to be moved.

There is no standard how image data is stored within the data field of a MRC format file. For an image some softwares consider the first pixel to be (1,1) in other software pixel (1,1) is on the last line (NY) of the image. THere is no standard image origin or volume handedness, nor is there any indication of this within the file

There is no image specific metadata available. As a result:

There is no image density range available. The range (and statistics if wanted) have to be recalculated on each usage.

There is no 'valid/used' image flag. If all images in the stack are not in use there must be a corresponding selection file of some sort to specify images. Some software does not make use of such a selection file. An operation which operates on a whole stack will often fail when it encounters an invalid/unused image.

There is no way to track image accession within the stack. A process/operation which fails part way through a stack can not be tracked from any info in the stack.

4 Sept. 2012 ArDean Leith

Interpolation and Improved Reconstruction Resolution

We recently introduced improved interpolation using FBS inside several SPIDER operations. We have shown that FBS gives significant improvements over the linear and quadratic interpolation used in SPIDER previously and is as good as the much slower gridded interpolation available in SPARX.

During refinement of a reference based reconstruction interpolation is used at four steps. These are: creation of reference images from an existing reference volume, application of existing alignment parameters to the experimental images, conversion of image rings to polar coordinates, and alignment of images prior to back projection into a volume.

When we modified our recommended procedure for refinement recon-loop.spi using the FBS interpolation alternatives in SPIDER and tested the refinement step using actual cryo-em data we were perplexed to find a small but repeatable decline in reconstruction resolution of an overall refinement step.

We investigated this decline using a ribosome data set consisting of four sets of noisy experimental images taken at different defocus levels containing over 6000 images. The decrease in resolution is caused by the application of existing alignment rotation and translations to the experimental images, before these images are compared to the reference projections for determination of the best matching pairs. The 'RT SQ' operation uses quadratic interpolation which adds an asymmetric filter effect to the results. This filtration ended up cutting noise in the aligned experimental images so that they gave better choice of matching reference images. Poorer interpolation gave a better outcome! But this observation pointed to a method of improving the refinement step. We have added a option to denoise the experimental images prior to the reference comparison in the 'AP SHC' operation. We evaluated Fourier lowpass, averaged box convolution, median box convolution, mean shift denoising, and anisotropic diffusion denoising before settling on Fourier lowpass filter as giving the best resolution results.

We have modified our recommended refinement procedure to use FBS interpolation in: 'PJ 3F' for the creation of the reference projections, 'AP SHC' during application of existing alignment parameters to the experimental images, and in 'RT SF' for creating the view used for backprojection. We also used FBS interpolation during conversion of images rings to polar coordinates. These improvements which are present in recon-loop.spi gave a significant improvement in resolution over the course of a complete refinement series compared to our previous procedure.

29 Aug. 2012 ArDean Leith

Fourier-based Spline Interpolation

We have developed a 2D and 3D Fourier-based Spline Interpolation Algorithm (FBS) in order to improve the performance of rescaling, rotation, and conversion from Cartesian to polar coordinates. In order to interpolate a two- or three-dimensional grid we use a particular sequential combination of correspondingly two and three 1D cubic interpolations with Fourier derived coefficients. A 1D cubic interpolation is a third degree polynomial:

Y(X)=A0 + A1*X + A2*X2 + A3*X3

where polynomial coefficients A0, A1, A2, and A3 are calculated from the Fourier transform of the image:

A0 = Y(0)
A1 = Y'(0)
A2 = 3(Y(1) -Y(0) - 2Y'(0) - Y'(1)
A3 = 2(Y(0) -Y(1)) + Y'(0) + Y'(1)

The derivatives at grid nodes were obtained using well-known relation between Fourier transforms of the derivative and the Fourier transform itself:

F((d)f(x,y)/(d)x) = i*2*pi*k*F(k,l)

where F(k,l) is a coefficient of discrete Fourier transform series F(f(x,y))

This allows us to calculate derivatives in any local point without a finite difference approximation involving the data from neighboring points.

We compared FBS to other commonly used interpolation techniques, quadratic interpolation and convolution reverse gridding (RG). A rotation of images by FBS interpolation takes roughly 1.1-1.5 as long as quadratic interpolation, but achieves dramatically better accuracy. The accuracy of FBS interpolation is similar to RG interpolation. However, FBS rotation is approximately 1.4-1.8 times faster than RG. FBS algorithm combines the simplicity of polynomial interpolation and ability to preserve high spatial frequency. Currently it has been incorporated into several operations in the open source package SPIDER for single-particle reconstruction.

9 Mar. 2011 ArDean Leith

Optimization

Since CPU hardware speeds are stagnant or decreasing there is increased interest in optimizing SPIDER's processing speed. Since SPIDER is a general purpose EM imaging package this means different things to different users. Locally the biggest time demand for our single particle reconstructions is alignment of images with reference projections (SPIDER operations: 'AP SH' and 'AP REF'). In order to access effect of changes in compiler options I used the operation: 'AP SHC' which is the latest highly 'tweaked' version of 'AP SH'). Usual data was a set of 375x375 pixel images and a comparison of 50 experimental images versus 550 references.

Compiler choice: We have access to both PGI and Intel Fortran compilers. I choose to use the PGI compilers because the Intel compiler produces poorly optimized executables for use on AMD Opteron hardware. The PGI compiled executables work well both on Intel and AMD hardware. The results reported here are using the current release for PGI compiler: Release 11.1).
Optimization Level: Aggressive optimization with PGI -O3 gives 3-4% speedup on the benchmark code. However this optimization level can only be used with great care. Some SPIDER operations give erroneous results with this compilation. This is probably due to differences in the execution order of statements and is a problem with floating point data that can potentially have wide variations in absolute value of the numbers. Changing order of arithmetic operations like subtract and divide can sometimes affect accuracy of the output. Thus use of -O3 can only be justified with carefull testing. Code for operation: 'AP SH' is mostly compiled at level O3 now following such extensive testing. Most non-alignment operations are compiled with level -O2.
Kieee FLag: Since SPIDER was ported to Linux from SGI I have always used the PGI flag -Kiee which says to strictly use IEEE conventions inside mathematical operations. Originally I used this in order to get same results from code compiled with PGI as with results from SGI code. PGI says this flag may slow operation but I am surprised to find that it increases speed of my benchmark by as much as 8%. Since it is also presumed to be more accurate, including use of this flag is a no-brainer.
Inlining Subroutines: Inlining subroutines/functions is expected to increase speed. There is less overhead stacking current subroutine data when invoking a called function. However in my benchmark it has negative effect on speed, slowing operation as much as 10%. Since inlining is also dangerous as it is tricky ensuring that the inlined code is kept in sync with the actual latest source, inlining is not helpfull.
Compiling for Large data: PGI compilers have flags -mcmodel=medium, -Mlarge_arrays which affect ability of the executable to handle large static data and large dynamically allocated data (typical of some operations which import large files of data). Depending on how SPIDER is used (particularly if inline/incore files are defined) some sites require the ability to handle these large files. The executables distributed with SPIDER have usually been compiled with -mcmodel=medium for handling large static arrays. Benchmarking shows that this has a insignificant impact on executable speed.
Compiling Static vs Dynamic Executables: Statically compiled executables do not require the presence of certain PGI or system libraries at execution time. In return the executable is larger than a dynamically linked executable. SPIDER has usually been distributed with static executables. My benchmark shows no difference in speed for these two types of executables. Since static executables have far less installation problems over varied Linux distributions and ages I have always preferred this option.
Compiling for use with OpenMP: PGI compilers have flag .-mp for creation of code that utilizes OpenMP parallization on suitable hardware. The executables distributed with SPIDER have been compiled with this flag for 20 years. Using all 12 cores of a dual-hexcore AMD Opteron gives 905% speedup over a single process on my benchmark.
Compiling For NUMA: AMD Opterons should support NUMA (Non-uniform memory architecture) execution when used on multi-processor hardware. PGI compilers have flag mp=numa that would utilize this capability when inside OpenMp. My benchmark shows no difference in speed for executables compiled with/without this flag on a dual-hexcore AMD Opteron compute node. Since use of this flag also requires dynamic executables it is not used in our distributed executables.
Compiling for use with SSE SIMD Vectorization: PGI and Intel compilers have flags e.g.-fastsse which allows optimization for use with SSE SIMD. This vectorization increases speed on suitable hardware. The executables distributed with SPIDER have been compiled with this flag for several years.
Compiling with Interprocedural analysis: PGI compilers have flag -ipa allows optimization across procedural boundaries. This may increase speed. My benchmark shows no difference in speed for executables compiled with/without this flag. However I am not certain that the compiler applies this analysis when source code is in different files so it may not have been a complete test of this option.

Source: random.html Page updated: 20 Jan. 2020 ArDean Leith