BLAST (Basic Local Alignment Search Tool)

Appendix E. blast2table.pl

It's often useful to have both the standard report and tabular data for the same search. This program converts standard WU-BLAST or NCBI-BLAST output to the NCBI tabular format (-m 8) described in Appendix A. It supports concatenated BLAST reports and is more efficient than most full-featured BLAST parsers. blast2table.pl can be used either on a file or in a pipe:

blast2table.pl my_blast_output > my_table_output

blastp nr query | blast2table.pl > table_from_wu-blast

The Unix tee program can create both the standard output and a table at the same time:

blastall -p blastp -d nr -i query | tee standard | blast2table.pl > table

blast2table.pl also has a few useful filtering options (see Table E-1). The following command displays only those with an alignment of over 50 percent identity and with a bit score greater than 20:

blast2table.pl -p 50 -b 20 blast_output

Table E-1. blast2table.pl options

Option

Description

-b

Minimum number of bits for each alignment

-e

Maximum E-value for each alignment or alignment group

-m

Minimum coordinate of the query sequence

-n

Maximum coordinate of the query sequence

-p

Minimum percent identity for each alignment

The complete program is listed next and may be downloaded as blast2table.pl from this book's web site.

#!/usr/bin/perl -w

use strict;

use Getopt::Std;

use vars qw($opt_p $opt_b $opt_e $opt_m $opt_n);

getopts('p:b:e:m:n:');

my $PERCENT = $opt_p ? $opt_p : 0;

my $BITS    = $opt_b ? $opt_b : 0;

my $EXPECT  = $opt_e ? $opt_e : 1e30;

my $START   = $opt_m ? $opt_m : 0;

my $END     = $opt_n ? $opt_n : 1e30;

my ($Query, $Sbjct);

my $HSP = "";

while (<>) {

    if    (/^Query=\s+(\S+)/) {outputHSP(  ); $Query = $1}

    elsif (/^>(\S+)/)   {outputHSP(  ); $Sbjct = $1}

    elsif (/^ Score = /) {

outputHSP(  );

my @stat = ($_);

while (<>) {

     last unless /\S/;

     push @stat, $_

}

my $stats = join("", @stat);

my ($bits) = $stats =~ /(\d\S+) bits/;

my ($expect) = $stats =~ /Expect\S* = ([\d\-\.e]+)/;

$expect = "1$expect" if $expect = ~/^e/;

my ($match, $total, $percent)

     = $stats =~ /Identities = (\d+)\/(\d+) \((\d+)%\)/;

my $mismatch = $total - $match;

        $HSP = {bits => $bits, expect => $expect, mismatch => $mismatch,

     percent => $percent, q_begin => 0, q_end => 0, q_align => "",

     s_begin => 0, s_end => 0, s_align => ""};

    }

    elsif (/^Query:\s+(\d+)\s+(\S+)\s+(\d+)/) {

        $HSP->{q_begin}  = $1 unless $HSP->{q_begin};

$HSP->{q_end}    = $3;

$HSP->{q_align} .= $2;

    }

    elsif (/^Sbjct:\s+(\d+)\s+(\S+)\s+(\d+)/) {

$HSP->{s_begin}  = $1 unless $HSP->{s_begin};

$HSP->{s_end}    = $3;

        $HSP->{s_align} .= $2;

    }

}

outputHSP(  );

sub outputHSP {

    return unless $HSP;

    return if $HSP->{percent}  < $PERCENT;

    return if $HSP->{bits}     < $BITS;

    return if $HSP->{expect}   > $EXPECT;

    return if ($HSP->{q_begin} < $START or $HSP->{q_end} < $START);

    return if ($HSP->{q_begin} > $END   or $HSP->{q_end} > $END);

    print join("\t", $Query, $Sbjct, $HSP->{percent},

length($HSP->{q_align}), $HSP->{mismatch},

countGaps($HSP->{q_align}) + countGaps($HSP->{s_align}),

$HSP->{q_begin}, $HSP->{q_end}, $HSP->{s_begin}, $HSP->{s_end},

$HSP->{expect}, $HSP->{bits}), "\n";

    $HSP = "";

}

sub countGaps {

    my ($string) = @_;

    my $count = 0;

    while ($string =~ /\-+/g) {$count++}

    return $count;

}