BLAST (Basic Local Alignment Search Tool)

Chapter 12. Hardware and Software Optimizations

This chapter explores how to optimize BLAST searches for maximum throughput and will help you get the most out of your current and future hardware and software. The first rule of BLAST performance is optimize your BLAST parameters. Incorrect settings can cause BLAST to run slowly, and you can often achieve surprising increases in speed by adjusting a parameter or two. Chapter 9 can help you choose the correct parameters for a particular experiment. If you're already running BLAST efficiently and want to get the most BLAST performance possible, read on.

12.1 The Persistence of Memory

Modern operating systems cache files. You may hear it referred to as RAM cache or disk cache, but we'll just call it cache. Once a file is read from the filesystem (e.g., hard disk), the file is kept in memory even after it is no longer used, assuming there's enough free RAM to do so. Why cache files? It's frequently the case that the same file is requested repeatedly. Retrieving from memory is much faster than from a disk, so keeping it in memory can save a lot of time. Caching can be very important in sequential BLAST searches if the database is located on a slow disk or across a network. While the first search may be limited by the speed that the database can be read, subsequent searches can be much faster.

The advantage of caching is most appreciable for insensitive BLAST searches, such as BLASTN with a large word size. In more sensitive searches, retrieving sequences from the database becomes a smaller fraction of the total elapsed time. In Table 12-1, note how the speed increase from caching is a function of sensitivity (here, word size).

Table 12-1. How caching benefits insensitive searches

Program

Word size

Search 1

Search 2

Speed increase

BLASTN

W=12

12 sec

7 sec

1.71 x

BLASTN

W=10

33 sec

28 sec

1.18 x

BLASTN

W=8

57 sec

52 sec

1.10 x

BLASTN

W=6

243 sec

238 sec

1.02 x

BLAST itself doesn't take much memory, but having a lot of memory assists caching. Look at the amount of RAM in your current systems and the size of your BLAST databases. As a rule, your RAM should be at least 20 percent greater than the size of your largest database. If it isn't and you do a lot of insensitive searches, a simple memory upgrade may boost your throughput by 50 percent or more. However, if most of your searches are sensitive searches or involve small databases, adding RAM to all your machines may be less cost-effective than purchasing a few more servers.

12.1.1 BLAST Pipelines and Caching

If you're running BLAST as part of a sequence analysis pipeline involving several BLAST searches and multiple databases, you may want to consider how caching will affect the execution of the pipeline. For example, look at the typical BLAST-based sequence analysis pipeline for ESTs depicted in Figure 12-1. The most obvious approach is to take each EST and pass it through each step. But is this the most efficient way?

Figure 12-1. EST annotation pipeline

figs/blst_1201.gif

It's common to design sequence analysis pipelines with the following structure:

for each sequence to analyze {

    for each BLAST search in the pipeline {

execute BLAST search

    }

}

However, you can switch the inner and outer loops to achieve this structure:

for each BLAST search in the pipeline {

    for each sequence to analyze {

execute BLAST search

    }

}

The problem with the first pipeline is that if the BLAST databases are large, they may not all be cached. Each BLAST database can bump out the previously cached file if you don't have enough RAM, and then you get no benefit from caching. The second structure keeps the same BLAST database in memory for all the sequences. Before you tear apart your current pipeline, however, remember that caching isn't going to help much with sensitive searches. If most of your searches are sensitive, it is a waste of effort to optimize the already fast parts of your pipeline. As in any tuning procedure, optimize the major bottlenecks first.

12.2 CPUs and Computer Architecture

The clock speed of a CPU isn't necessarily an accurate indicator of how fast it will run BLAST. There are other complicating factors such as the amount of L2 cache, the memory latency and the speed of the front-side bus. Unfortunately, there is no good rule to predict how fast BLAST will perform on a particular computer except for the obvious within-family predictions—for example, that a 1-GHz Pentium III will be faster than an 800-MHz Pentium III. The best you can do is to benchmark a bunch of systems or contact people who have already done so.

Two benchmarks are provided Table 12-3. Before reading the description, please understand that you should use extreme caution whenever interpreting any benchmarks because the benchmarking protocol may be very different from your everyday tasks, and therefore may not reflect real-world performance. The best benchmark procedure should mimic your daily routine. In addition, if you use benchmarks to decide what hardware to purchase, you may be in for a nasty surprise, as other important considerations may override a simplistic interpretation of the "most BLAST for the buck." Total cost of ownership is a complicated equation that includes maintenance, support, facilities, cooling, and interfacing with legacy equipment and culture.

Chapter 12 shows the performance on various platforms when searching all members of a database against themselves. There are two databases, and both can be found athttp://examples.oreilly.com/BLAST. The tests were performed using default parameters for NCBI-BLAST. The following command lines were used:

time blastall -p blastn -d ESTs -i ESTs > /dev/null

time blastall -p blastp -d globins -i globins > /dev/null

Table 12-2. Performance benchmarks of various systems

CPU; clock speed

blastn test

 

blastp test

 
 

Time (sec)

Giga-cycles

Time (sec)

Giga-cycles

Macintosh G4: 550 MHz

1011

556

1599

879

Sun Ultra Sparc III; 750 MHz

835

626

1427

1070

Intel Pentium III; 1 GHz

649

649

1187

1187

Intel Pentium IV Xeon; 1.8 GHz

469

844

788

1418

AMD Athlon 1800+; 1.533 GHz

416

638

741

1136

12.2.1 Multiprocessor Computers

One way to speed up BLAST is to employ multiprocessor computers. BLAST is a multithreaded application and can utilize the additional processors. Adding additional processors to a computer is sometimes cheaper than purchasing multiple machines because you don't have to duplicate all the other components. That said, once outside the commodity computer market, prices rise steeply, and a computer with 32 CPUs is likely to cost you much more than 16 dual-CPU computers. The improvement with multiple processors isn't completely linear, and it depends on the type of search.

If you want a single BLAST job to complete quickly, it's best to use as many CPUs as possible. On NCBI-BLAST, you can increase the number of processors with the -a option, and on WU-BLAST you use the -cpus option (see Chapter 13 and Chapter 14 for more information). However, for best aggregate performance, it is better to use only 1 CPU for each BLAST job and load up the machine with as many jobs as there are processors. If you are searching multiple databases, you may need a lot of RAM if you wish to keep them all cached.

12.2.2 Operating Systems and Compilers

Even on the same hardware, BLAST may run faster under different operating systems. Due to the complex interactions between operating systems, compilers, and computer architecture, it is difficult to predict what the optimal combination will be. If you have the time and inclination, you might be able to eke out as much as an extra 5 percent in speed. However, choosing an operating system based entirely on BLAST performance may not be a wise choice, so this is probably the last thing to consider.

12.3 Compute Clusters

The price and performance of commodity computer hardware and the sophistication of modern free operating systems have made it very attractive to set up computer clusters rather than purchase a multiprocessor behemoth. Clusters don't have to be dedicated rack-mounted towers of blinking lights and buzzing drives; they can also be a mixture of desktop computers that use their idle time to run jobs. There are two fundamental kinds of clusters: Beowulf-style clusters and compute farms. Beowulf clusters act as a single computer, sharing memory and CPU cycles to cooperatively solve the same problem. Compute farms don't share memory or CPU cycles and solve separate, but possibly related problems. The field of bioinformatics has algorithms that are appropriate for both kinds of clusters, but BLAST is really a job that is best suited to compute farms. There are two major reasons for this: (1) BLAST is more data-intensive than compute-intensive, and (2) large-scale BLAST searches consist of many small jobs that are easily parallelized on separate machines.

If you wish to build your own cluster, be prepared for quite a bit of work. There are plenty of considerations outside the normal window-shopping for the best price-performance ratio. For example, one of the most common problems is having sufficient power and cooling. It doesn't do much good to have a super computer that is constantly overheating and burning out its components. Total cost of ownership is a complicated equation, and you're better off not trying to solve this entirely on your own. Your best bet is to talk with people who build clusters for a living. Several companies will sell you prepackaged compute farms for running BLAST. For those who like getting their hands dirty, the bioclusters mailing list at http://bioinformatics.org has plenty of useful information in their archives and helpful members who will gladly give advice.

12.3.1 Remote Versus Local Databases

When designing a cluster, one of the most important decisions is where to put your BLAST databases. There are two general choices: (1) store the database on a file server and let the cluster access it remotely over a network, or (2) keep a local copy of the database on each computer. Both methods have their advantages and disadvantages, so there is no simple way to determine which is better.

12.3.1.1 Remote databases

It's simpler to manage the files on one computer than on multiple computers. This is particularly true if you update your BLAST databases on a frequent, perhaps daily basis. So this is one good reason to use remote databases. If you run your compute nodes diskless, it is really the only choice. The main concerns with this approach are network bandwidth and the speed of the file server. Most computers today have 100-Mbps network interfaces. This translates to 12.5 MBps. Fast computers performing insensitive searches (e.g., BLASTN) can actually exceed this transfer rate. In this case, the compute nodes will sit idle, waiting for data. But what happens when multiple computers are all connected to the same database server? Unfortunately, they must all share the same network bandwidth from the server, so if 10 compute nodes are connected to a database server, each one may get only data at 1.25 MBps. Not good. But remember that if the compute nodes have enough RAM and the databases aren't falling out of cache, subsequent searches will be very fast because they can read the database directly from memory.

One obvious improvement is to employ faster networking. Doing so increases the cost of each compute node a little and significantly increases the cost of network switches because gigabit network switches are still quite expensive. However, it is possible to use a hybrid solution in which the database server is connected to a hybrid network switch via a gigabit line and the compute nodes are connected to the switch via the more common 100-Mb interface. This is much cheaper than using gigabit everywhere, and, because exceeding 12.5 MBps is rare, it doesn't hinder performance too much.

When building file servers, people often neglect to put in enough RAM. For BLAST database servers, though, you really want as much RAM as possible. Caching applies on the file-server end, too, and if several computers request data from the file server, it's much better if it can be served from memory rather than from disk. If you're thinking of using autonomous network attached servers as a BLAST database server, think again. Most don't have gigabit networking or enough RAM.

12.3.1.2 Local databases

Keeping local copies of your BLAST databases on each node of the cluster will make access to the data very fast. Most hard disks can read data at 20 to 30 MB per second or about double what you could get from common networking. If your network is slow, your cluster is large, or your searches are really insensitive, it's much better to have local copies of databases. The main concern with this approach is keeping the files synchronized and updated with respect to a master copy. This can be done via rsync or other means. However, if all the nodes update their databases at the same time across a thin pipe, this operation could take a long time, and the compute nodes may sit idle.

A lesser concern is the disks themselves. They cost money and are a potential source of hardware failure (for this reason, some people advocate running the compute nodes diskless). When discussing disks, there's a great deal of debate over IDE versus SCSI. Drives using the IDE interface are generally slower and less reliable, but are much less expensive. Experts on both sides of the debate will argue convincingly that buying one type of drive makes more sense than buying the other. However, for optimal performance, you really should access the database from cache rather than disk, and therefore the disk shouldn't really matter. Those who choose IDE or SCSI aren't necessarily fools, but people who fail to put enough RAM in their boxes are.

12.4 Distributed Resource Management

If you're running a lot of BLAST jobs, one problem to consider is how to manage them to minimize idle time without overloading your computers. Being organized is the simplest way to schedule jobs. If you're the only user, you can use simple scripts to iterate over the various searches and keep your computer comfortably busy. The problem starts when you add multiple users. In a small group, it's possible for users to cooperate with one another without adding extra software. Sending email saying "hey, stay off blast-server5 until I say so" works surprisingly well. But if you have a large group or irresponsible users, you'll want some kind of distributed resource management (DRM) software.

There are a number of DRM software packages, both free and commercial. But even the free ones will cost you time to install and maintain, and users need training to use the system. Table 12-3 lists some of the most popular packages in the bioinformatics community. Condor is an established DRM that is downloadable for free. It is rare in that it supports Windows and Unix. LSF is a mature product with many bioinformatics users. It is, however, expensive. For large groups, however, the robustness makes the cost justifiable. Parasol is purpose-built for the UCSC kilocluster and throws out some of the generalities for increased performance. PBS and ProPBS are popular DRMs, and if you're an academic user, you can get ProPBS for free. SGE is a relative newcomer but has a strong following, partly due to the fact that it's an open source project.

Table 12-3. DRM software

Product

Description (as advertised)

Condor

Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job-queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor; Condor then places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.

http://www.cs.wisc.edu/condor

LSF

· Platform LSF 5 is built on a grid-enabled, robust architecture for open, scalable, and modular environments.

· Platform LSF 5 is engineered for enterprise deployment. It provides unlimited scalability with support for over 100 clusters, more than 200,000 CPUs, and 500,000 active jobs.

· With more than 250,000 licenses spanning 1,500 customer sites, Platform LSF 5 has industrial-strength reliability to process mission-critical jobs reliably and on time.

· A web-based interface puts the convenience and simplicity of global access to resources into the hands of your administrators and users.

· Platform LSF 5, with its open, plug-in architecture, seamlessly integrates with third-party applications and heterogeneous technology platforms.

http://www.platform.com

Parasol

Parasol provides a convenient way for multiple users to run large batches of jobs on computer clusters of up to thousands of CPUs. Parasol was developed initially by Jim Kent, and extended by other members of the Genome Bioinformatics Group at the University of California Santa Cruz. Parasol is currently a fairly minimal system, but what it does, it does well. It can start up 500 jobs per second. It restarts jobs in response to the inevitable systems failures that occur on large clusters. If some of your jobs die because of your program bugs, Parasol can also help manage restarting the crashed jobs after you fix your program.

http://www.soe.ucsc.edu/~donnak/eng/parasol.htm

PBS

The Portable Batch System (PBS) is a flexible batch queuing and workload management system originally developed by Veridian Systems for NASA. It operates on networked, multiplatform UNIX environments, including heterogeneous clusters of workstations, supercomputers, and massively parallel systems. Development of PBS is provided by the PBS Products Department of Veridian Systems.

http://www.openpbs.org

ProPBS

The PBS Pro Version 5.2 workload management solution is the professional version of the Portable Batch System. Built on the success of OpenPBS, PBS Pro goes well beyond it with the features and support you expect in a mission-critical commercial product, such as:

· Shrink-wrapped, easy-to-install binary distributions

· Support on every major version of Unix and Linux

· Enhanced fault tolerance and scalability

· Enhanced scheduling algorithms

· Computational grid support

· Direct support from the team that created PBS

· New, rewritten documentation

· Source code availability

http://www.propbs.com

SGE

The Grid Engine project is an open source community effort to facilitate the adoption of distributed computing solutions. Sponsored by Sun Microsystems and hosted by CollabNet, the Grid Engine project provides enabling distributed resource management software for wide-ranging requirements from compute farms to grid computing.

http://gridengine.sunsource.net

12.5 Software Tricks

In addition to choosing appropriate BLAST parameters and optimizing your hardware set, you can use a few software tricks to increase your BLAST performance. Most of these tricks involve splitting or concatenating sequences into optimal-sized pieces because very large and very small sequences are inefficiently processed by BLAST.

12.5.1 Multiplexing/Query Packing

Input and output (I/O) can become a large fraction of the overall CPU load when the search parameters are insensitive, such as when running BLASTN. If you find yourself running a lot of BLASTN searches, you can pack multiple queries together and reduce the overhead of reading the database repeatedly. For example, let's say you have a collection of 100,000 ESTs from your favorite organism and you want to search them against all other ESTs in the public database. If you search them one at a time, you will perform 100,000 BLAST searches and therefore have to read the database 100,000 times. It should go without saying that caching is essential in such a task.

But what if you glue the sequences together in groups of 100? Well, you've just cut your database I/O down to 1 percent of what it used to be, which can be a significant savings. For ESTs and other sequences of this length, the speed up is typically tenfold. This technique is called multiplexing or query packing. It isn't as simple as it sounds because there must be a way to prevent alignments from bridging the sequences, the coordinates must be remapped, and the statistics need to be recalculated. MegaBLAST, part of the NCBI-BLAST distribution, is a specialized version of BLASTN that multiplexes queries and includes a variety of other optimizations. It's really fast, and anyone doing a lot of BLASTN searches should use this program. You can find more information about MegaBLAST in Chapter 9 and Chapter 13. Query packing can also be accomplished with a single, sophisticated Perl script (see MPBLAST at http://blast.wustl.edu).

12.5.2 Query Chopping

Larger sequences require more memory to search and align. This can blow away your cached database, or worse, cause the computer to start swapping (using the disk for RAM). In addition, for a variety of reasons, larger query sequences are processed less efficiently. One way to solve this problem is to divide the query sequence into several segments, search them independently, and then merge the results back together. This is called query chopping and is effectively the opposite of query packing. The main difficulty with query chopping is dealing with alignments that cross the boundaries between segments.

Both NCBI-BLAST and WU-BLAST let you specify that only a subsequence of a large query sequence is to be searched (see the -L parameter in Chapter 13 and the newstart and nwlen parameters in Chapter 14). Currently, this works a little better for WU-BLAST because alignments seeded in a restricted region can extend outside this region, so there's no need to stitch together the alignments between neighboring segments. The following Perl script searches chromosome-sized sequences in 100-KB segments using WU-BLAST. All coordinates and statistics are identical to a search with an entire chromosome. Note that complexity filters are currently applied to the whole sequence, so apply these filters ahead of time.

#!/usr/bin/perl -w

use strict;

die "usage: $0 <wu-blast command line>\n" unless @ARGV >= 3;

my ($BLAST, $DB, $Q, @P) = @ARGV;

die "ERROR ($0): single FASTA files only\n" if `grep -c ">" $Q` > 1;

my $params = "@P";

die "ERROR ($0): filter ahead of time\n" if $params =~ /filter|wordmask/;

open(FASTA, $Q) or die;

my $def = <FASTA>;

my $count = 0;

while (<FASTA>) {$count += length($_) -1}

my $segment = 100000;

for (my $i = 1; $i <= $count; $i += $segment) {

    system("$BLAST $DB $Q  nwstart=$i nwlen=$segment");

}

12.5.3 Database Splitting

If you have a computer cluster and a lot of individual BLAST jobs to run, you can easily split the jobs among the nodes of your cluster. But what if you have a single, slow BLAST job that you want to spread out over several computers? If your sequence is very large, you can use query chopping as described earlier and assign each computer a separate segment. But what if your sequence isn't so large? A good solution is to have each computer search only part of the database. You'll need to do a little statistical manipulation to set the effective search space to the entire database, as well as some post-processing to merge all the reports together, but overall the process is pretty simple. The hard part is making sure the database is properly segmented on the various computers.

If you're using NCBI-BLAST, you can create database slices using alias databases as described previously. This allows a great deal more flexibility than physically splitting the databases into various parts. But remember that alias databases require that you use GI numbers in the FASTA identifier.

If you're using WU-BLAST, you can split the database dynamically. WU-BLAST has command-line parameters called dbrecmin and dbrecmax that describe the minimum and maximum database records. You can assign each node of the cluster a different subsection of the database by simply assigning dbrecmin and dbrecmax. For example, if your database contains 100 records and you have 10 nodes, node 1 gets records 1 to 10, node 2 gets records 11 to 20, etc. To benefit from caching, each node should be assigned the same database slice.

12.5.4 Serial BLAST Searching

As discussed in Chapter 5, the best way to speed up BLAST searches is by making the seeding more stringent. The only problem is that low-scoring alignments may be lost. High scoring alignments, however, are relatively resistant to changes in seeding parameters. The serial strategy takes advantage of this property; it uses an insensitive search to identify database matches and then a sensitive search to generate the alignments. An intuitive way to think about this with genomic sequence is "if I can hit just one exon, I can get the whole gene." The procedure has three steps and can be carried out with a simple script:

1.  Run BLAST with insensitive parameters.

·  Build a BLAST database from the matches.

·  Run BLAST with sensitive parameters on just the matches.

NCBI-BLAST doesn't currently offer a wide range of word sizes, so serial searching is best carried out with WU-BLAST. Example 12-1 shows a script that wraps up the entire procedure.

Example 12-1. A script for serial BLAST searching

#!/usr/bin/perl -w

use strict;

die "usage: $0 <database> <query> <wordsize> <hitdist>\n" unless @ARGV == 4;

my ($DB, $Q, $W, $H) = @ARGV;

$H = $H ? "hitdist=$H" : "";

my $tmpdir = "/tmp/tt-blastx.tmpdir";

END {system("rm -rf $tmpdir") if defined $tmpdir}

system("mkdir $tmpdir") == 0 or die "ERROR ($0): can't create $tmpdir\n";

my $STD = "B=100000 V=100000 wordmask=seg";

# search

system("blastx $DB $Q W=$W T=999 $H $STD > $tmpdir/search") == 0 or die;

# collect names

my @name;

open(NAME, ">$tmpdir/names") or die;

open(SEARCH, "$tmpdir/search") or die;

while (<SEARCH>) {print NAME "$1\n" if /^>(\S+)/}

close SEARCH;

close NAME;

# build second stage database

system("xdget -p -f $DB $tmpdir/names > $tmpdir/database") == 0 or die;

system("xdformat -p $tmpdir/database") == 0 or die;

# align

system("blastx $tmpdir/database $Q $STD") == 0 or die;

To demonstrate the performance of the serial strategy, the script in Example 12-1 performs a search of a Caenorhabditis briggsae genomic fragment (c009500587.Contig4) against all C. elegans proteins (wormpep97). To minimize the effect of chance similarities, only alignments with at least 30 amino acids and 35 percent identity are analyzed. The search parameters, search speed, and number of HSPs found are displayed in Table 12-4. The first two rows correspond to standard, nonserial searches. Using the parameters recommended in Chapter 9 (row 2) BLASTX runs seven times faster than the very sensitive WU-BLAST default parameters (row 1). This speed is paid for by a loss in sensitivity (number of HSPs). The serial searches (rows 3 and above) offer varying levels of speed and sensitivity. Only a few combinations of W and T are presented; there are many useful combinations. Of particular interest is row 4, which has approximately the same sensitivity as row 1, but runs 18 times faster. Not bad for a short script. Because BLAST is under active development, perhaps you'll see serial searching become a standard part of BLAST software.

Table 12-4. Serial BLAST performance

#

First search

Second search

Speed

Elapsed time (sec)

HSPs

1

W=3 T=12

None

1 x

883.3

251

2

W=3 T=14 hitdist=40

None

7 x

121.4

186

3

W=3 T=999 hitdist=40

W=3 T=12

14 x

62.1

230

4

W=4 T=999

W=3 T=12

18 x

49.1

248

5

W=5 T=999

W=3 T=12

50 x

17.6

219

6

W=4 T=999 hitdist=40

W=3 T=12

80 x

11.1

137

7

W=5 T=999 hitdist=40

W=3 T=12

110 x

7.9

116

12.6 Optimized NCBI-BLAST

The source code for NCBI-BLAST is in the public domain, and anyone can modify it without restriction (ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools). It's therefore not surprising that there are a number of variants. The rest of this chapter discusses three of them.

12.6.1 Apple/Genentech BLAST

Macintosh G4 computers have an additional vector processing unit called VelocityEngine or Altivec that can process several similar instructions in parallel. Apple Computer and Genentech collaborated to rewrite portions of NCBI-BLAST to take advantage of the Altivec processor. These modifications affect the seeding phase of BLASTN. The result, AG-BLAST, significantly outperforms NCBI-BLAST under certain conditions.

Table 12-5 shows an experiment in which a Caenorhabditis elegans transcript (F44B9.10) was searched against the Caenorhabditis briggsae genome using various word sizes but otherwise default parameters (the hardware is a 550-MHz PowerBook). For cross-species work, it's generally a good idea to employ word sizes slightly smaller than the default 11 to minimize the chance of missing meaningful similarities. Here, AG-BLAST has a significant speed advantage over NCBI-BLAST. AG-BLAST also runs faster at very large word sizes, which is useful if you are matching sequences that are expected to be identical or nearly identical (e.g., mapping ESTs to their own genome).

Table 12-5. Apple/Genentech BLAST

W

NCBI-BLAST (sec)

AG-BLAST (sec)

Speed increase

8

56.9

37.9

1.5 x

9

50.0

9.5

5.3 x

10

46.6

5.5

8.5 x

11

2.9

2.8

1.0 x

15

2.1

2.1

1.0 x

20

1.4

1.0

1.4 x

30

1.4

0.6

2.3 x

40

1.4

0.5

2.8 x

AG-BLAST does have a few disadvantages. First, the version may be slightly out of date with respect to NCBI-BLAST. The current version of AG-BLAST is based on 2.2.2, while NCBI-BLAST is up to Version 2.2.6. Not all changes are backward-compatible; for example, the latest preformatted databases require Version 2.2.5. Second, AG-BLAST doesn't work with multiple CPUs. You can execute more than one job at a time, but you can't use the -a option to increase the number of CPUs used by a single process. Finally, the minimum word size on AG-BLAST is 8, or one greater than the NCBI-BLAST minimum. See http://developer.apple.com/hardware/ve/acgresearch.html for more information.

12.6.2 Paracel-BLAST and BlastMachine

Paracel makes an NCBI-BLAST derivative called Paracel-BLAST and sells it with a prepackaged computer cluster called a BlastMachine. This product takes all the high performance hardware and software tricks and puts them into a single, easy-to-use product. The hardware is a rack of Linux-Intel machines, and the DRM software is Platform LSF. Large query sequences are chopped, small ones are packed, and data is distributed so the search comes back as fast as possible. This is really convenient because it lets users concentrate on what they want to do and not how they have to do it. In the end, more science and less frustration is a good thing.

See http://www.paracel.com for more information.

12.6.3 TimeLogic Tera-BLAST

TimeLogic uses an entirely different approach to optimizing BLAST. The BLAST algorithm is soft-wired into a special kind of chip called a field programmable gate array (FPGA). Each FPGA executes the search very quickly and multiple FPGA boards reside in a single computer called a DeCypher accelerator. The end result is a specialized computer that is limited in what it can do, but what it does, it does astonishingly well. A single DeCypher accelerator running Tera-BLAST (the name for their NCBI-BLAST-derived algorithm) is the equivalent of about 100 general-purpose computers. Shockingly, it all fits in a standard server case. Such technology doesn't come cheaply. However, if you do a lot of BLAST searches (or use some of the other algorithms they provide), it may be far cheaper than a huge cluster, especially when you consider power consumption and maintenance.

One hidden cost in specialized systems such as a DeCypher accelerator is the time and effort required to integrate them with more general systems you may already have. If you have a stepwise sequence-analysis pipeline already worked out, it may be difficult to adapt it to Tera-BLAST. Tera-BLAST works most efficiently with big jobs, and to take advantage of this requires giving it a whole bunch of sequences at once. Thus, you might have to restructure your pipeline in much the same way as discussed earlier with respect to caching.

TimeLogic also offers a completely new variant of BLAST called Gene-BLAST. This algorithm strings together HSPs by dynamic programming (an affine Smith-Waterman with two levels of gap scoring schemes) to achieve a better model of exon-intron structure. Gene-BLAST works with both nucleotide- and protein-level alignments and appears to be a welcome new addition to the BLAST family. Unfortunately, the only way to run Gene-BLAST is on TimeLogic hardware. See http://www.timelogic.com for more details.