BLAST (Basic Local Alignment Search Tool)

Chapter 14. WU-BLAST Reference

WU-BLAST was developed and is maintained entirely by Warren Gish. He was one of the original authors of BLAST while at the NCBI but is now at Washington University in St. Louis (where the WU comes from). Development began in 1994 at Version 1.4, before BLAST had gapped alignments. Quite a lot has changed since then. Paradoxically, WU-BLAST is more similar to the original BLAST than the current NCBI version.

WU-BLAST is useful because it has more command-line parameters that allow advanced users to control the program with more precision. It is also faster. Table 14-1 displays features unique to WU-BLAST or significantly different from NCBI-BLAST.

Table 14-1. WU- and NCBI-BLAST feature differences

Feature

WU-BLAST

NCBI-BLAST

Word size

Any word size for any program mode. Neighborhood words are turned off for word sizes of 5 or greater, but may be activated by setting an explicit value for T.

blastn has a minimum word size of 7. blastpblastxtblastn, and tblastx have word sizes of 2 or 3. Neighborhood words are never used for blastn.

Nucleotide scoring

Choice of match/mismatch or scoring matrix.

Only match/mismatch scoring.

Nucleotide statistics

Karlin-Altschul parameters are available for several match/mismatch values and gap costs.

Karlin-Altschul parameters are always computed without respect to gap costs. Reported E-values may greatly overestimate significance.

altscore

Allows score modification for any matrix (e.g., to set stop scores lower).

Nothing similar.

H, K, L, gapH, gapK,gapL

Especially useful when using unsupported scoring schemes; allow the provision of values for Karlin-Altschul parameters.

Nothing similar. Unsupported scoring schemes are fatal errors.

Alias databases

No, but virtual databases offer similar functionality.

Yes, both alias and virtual databases are supported.

Gapped alignment

All programs.

All programs except tblastx.

/etc/sysblast

Allows systems administrators to set system-wide resource restrictions.

Nothing similar.

Database subset selection

Yes, via dbrecmin and dbrecmax.

No, but alias databases can be used for static splitting.

Restricted region of query

The nwstart and nwlen parameters restrict seeding but not alignment.

-L restricts both seeding and alignment.

links

Displays the order of alignments in a group.

Nothing similar.

topcomboN

Allows restriction of number alignment groups. Groups are clearly labeled.

Nothing similar.

kap

Computes significance without sum statistics.

Nothing similar.

olf, golf, olmax, golmax

Allows setting of overlap rules for HSP consistency.

Fixed internally.

notes, warnings,errors

Descriptive messages at various levels ofcaution.

Most error messages are terse and not user friendly.

Output formats

Only the standard format.

Multiple output report formats including HTML, ASN.1, XML, tabular, and anchored multiple alignments. See Appendix A.

To use the most recent version of WU-BLAST, you must have a site license from Washington University in St. Louis. The product is free for academic use, but commercial users must pay a fee. Unlike NCBI-BLAST, the source code isn't freely available. For the latest information on WU-BLAST, visit the official site at http://blast.wustl.edu. If you want to try WU-BLAST, an early version is available without license.

14.1 Usage Statements

All WU-BLAST programs provide usage statements if they are executed without any arguments. They are sometimes lengthy, so it's best to pipe them through a pager such as less or more.

blastn | more

xdformat | less

xdget | less

14.2 Command-Line Syntax

WU-BLAST command-line syntax isn't uniform between all programs. The BLAST programs blastnblastpblastxtblastn, and tblastx use a slightly different syntax than do xdformat, and xdget.

The BLAST program options come after the mandatory arguments of database and query sequence. The command-line structure is as follows:

[program name] [blast database] [query sequence] [parameters]

The parameter names in the BLAST programs and their arguments have some flexibility. The following command lines are all identical:

blastn db query E=10

blastn db query -E 10

blastn db query E 10

blastn db query -E=10

This book uses the first form to avoid confusion with NCBI-BLAST.

xdformat and xdget use the traditional Unix syntax where the parameters precede the mandatory arguments:

[program name] [parameters] [mandatory arguments]

The xdformat and xdget options are all single letters preceded by a single dash. For parameters that require a value, a space between the parameter and its value is optional. As is typical for Unix programs, a double dash indicates the end of command-line options and a single dash signifies stdin.

xdformat -p protein_db

xdformat -n -I nucleotide_db

zcat fasta.*.gz | xdformat -n -o my_db -- -

14.3 WU-BLAST Parameters

WU-BLAST has many control parameters, some of which are esoteric and rarely useful. The most important parameters are listed here.

altscore=[string]

 

Default: Off

 

Defines an alternate scoring system for any pair of letters. For example, altscore="M M -3" changes the score of M-M pairs to -3, and altscore="A C 4" gives a score of 4 if the query is A and the subject is C. Letters may be designated as any to change an entire row or column. The score can be given as min or max for the minimum and maximum scores in the matrix or na to make the score infinitely low. To set the score of all rows and columns containing stop codons to negative infinity, set altscore="* any na" and altscore="any * na". If you change the scoring parameters, you may also want to adjust gapL, gapH, and gapK.

 

See also

nogapgapLgapHgapK

B=[integer]

 

Default: 250

 

Sets the number of database hits to report. A warning is issued if this number is exceeded. It is typical to set this parameter to a very high value, such as B=100000, to ensure that no alignments are missed.

 

bottom

 

Default: Off

Programs: blastn, tblastx, blastx

Search only the bottom strand of the query.

 

See also

top

cpus=[integer]

 

Default: 4 for blastn; all for blastp, blastx, tblastn, and tblastx

 

Sets the number of processors to use. If not set, all processors on the system may be used except blastn, which will limit itself to 4. See Chapter 10 for information on the/etc/sysblast file used for setting systemwide resource limitations.

 

dbrecmax=[integer]

 

Default: Last database record

 

Last database record number to search.

 

See also

dbrecminqrecminqrecmax

dbrecmin=[integer]

 

Default: 1

 

First database record number to search. For example, by setting dbrecmin=1 dbrecmax=10, only the first 10 database sequences are searched.

 

See also

dbrecmaxqrecminqrecmax

E=[number]

 

Default: 10

 

This is the E from the Karlin-Altschul equation. Database hits whose E-value is greater than this threshold will not be reported. If both E and S are set, the more restrictive parameter is used.

 

See also

S

E2=[number]

 

Default: Variable; calculated from scoring parameters

 

Sets the alignment threshold for ungapped alignments. When E2 and S2 are set, the more restrictive parameter is used.

 

See also

S2gapE2gapS2

echofilter

 

Default: Off

 

Prints out the query sequence after all filtering is performed. This is useful for troubleshooting when there are no database hits, and you suspect the filtering is too aggressive.

 

See also

filterwordmaskmaskextra

errors

 

Default: Off

 

Suppress nonfatal error messages. It is generally a good idea to pay attention to the error messages, but at times it is useful to block them.

 

See also

nonnegoknovalidctxok

filter=[string]

 

Default: Off

 

Processes the query sequence with the specified filtering method. Letters are replaced with X and N for proteins and nucleotides, respectively.

seg

Identifies low-complexity regions in both nucleotide and amino acid sequences.

dust

The standard low-complexity filter for nucleotide sequences. Generally less sensitive than seg.

xnu

Finds short repeats in protein sequences.

seg+xnu

Combines both seg and xnu.

ccp

Coiled-coil filter for proteins.

Multiple filtering methods may be specified on the same command line; for example:

blastp nr query filter=seg filter=ccp filter=xnu

 

See also

echofiltermaskextrawordmask

gapE2=[number]

 

Default: Variable; calculated from scoring parameters

 

Expectation threshold for saving individual gapped alignments. When gapE2 and gapS2 are set, the more restrictive parameter is used.

 

See also

gapS2E2S2

gapH=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the value of H (information per aligned letter) for gapped alignments. If a particular combination of scoring matrix (or match/mismatch scores) and gap values doesn't already have precomputed values for gapH, gapK, and gapL, WU-BLAST uses ungapped statistics. In this case, the resulting E-values may be much too low. A warning is issued when this is the case. Computing proper values for gapped Karlin-Altschul parameters requires simulations with random sequences that determine what ungapped scoring scheme is most similar to the gapped scoring scheme.

 

See also

HKgapKLgapLwarnings

gapK=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the value of the Karlin-Altschul K parameter for gapped alignments. See the description for gapH.

 

See also

HgapHKLgapL

gapL=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the value of the Karlin-Altschul parameter lambda (information per unit score) used for gapped alignments. See the description for gapH.

 

See also

HgapHKgapKL

gapS2=[integer]

 

Default: Variable; calculated from scoring parameters

 

Score threshold for saving individual gapped alignments. Alignments below the threshold aren't reported.

 

See also

gapE2

gapsepqmax=[int]

 

Default: Unlimited

 

Maximum separation allowed between gapped alignments along the query.

 

See also

gapsepsmaxhspsepqmaxhspsepsmax

gapsepsmax=[int]

 

Default: Unlimited

 

Maximum separation allowed between gapped alignments along the subject.

 

See also

gapsepqmaxhspsepqmaxhspsepsmax

gapX

 

Default: Variable; depends on scoring parameters

 

Sets the alignment extension cutoff for gapped alignment.

 

See also

X

gi

 

Default: Off

 

Displays the GenInfo identifiers of database hits, if present.

 

golf=[number]

 

Default: 0.1

 

Maximum fractional length overlap for gapped alignment consistency. See the description for olf.

 

golmax=[integer]

 

Default: Unlimited

 

Maximum absolute length of overlap for gapped alignment consistency. See the description for olf.

 

gspmax=[integer]

 

Default: 1,000

 

Sets the maximum number of gapped alignments per subject sequence. gspmax is bounded by hspmax. A value of 0 implies no limit.

 

See also

hspmax

H=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the value of the Karlin-Altschul parameter H.

 

See also

gapHKgapKLgapL

hspmax=[integer]

 

Default: 1000

 

Sets the maximum number of ungapped alignments per subject sequence. A warning is issued if this limit is exceeded. A value of 0 implies no limit.

 

See also

gspmax

hitdist=[integer]

 

Default: 0, off

 

Maximum distance between word hits for the two-hit seeding algorithm. WU-BLAST uses one-hit seeding by default.

 

hspsepqmax=[int]

 

Default: Unlimited

 

Maximum separation allowed between alignments along the query.

 

hspsepsmax=[int]

 

Default: Unlimited

 

Maximum separation allowed between alignments along the subject.

 

K=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the value for K from the Karlin-Altschul equation.

 

See also

gapKHgapHLgapL

kap

 

Default: Off

 

Assesses individual alignment scores with Karlin-Altschul statistics rather than using sum statistics on groups of alignments.

 

L=[number]

 

Default: Variable; depends on scoring parameters

 

Sets lambda (nats per unit score) from the Karlin-Altschul equation.

 

See also

gapLHgapHKgapK

lcfilter

 

Default: Off

 

Filters lowercase letters in the query sequence. The lowercase letters are treated as if they had been filtered out by one of the filtering programs.

 

See also

echofilterfilterwordmasklcmask

lcmask

 

Default: Off

 

Masks lowercase letters in the query sequence for seeding only. Lowercase letters in the query sequence aren't used in the initial word search but are available for alignment during the extension stage; known as soft masking.

 

See also

echofilterfilterwordmasklcfilter

links

 

Default: Off

 

Display group information. Parentheses indicate the placement of the alignment in the group. The following example shows three alignments in the group. The score of the second reported alignment is 159, the last alignment in the chain.

Score = 159 (61.0 bits), Sum P(3) = 2.7e-38

Identities = 26/39 (66%), Positives = 32/39 (82%)

Links = 1-3-(2)

 

See also

topcomboN

M=[integer]

 

Default: +5 blastn

 

Sets the match score. This parameter is usually used for blastn only but may be used for other programs.

 

See also

N

maskextra=[integer]

 

Default: Off

 

Extends masking an extra distance of [integer] letters.

 

See also

echofilterfilterwordmasklcfilterlcmask

matrix=[file]

 

Default: BLOSUM62

Programs: blastp, blastx, tblastn, tblastx

Specifies a scoring matrix file. The default is BLOSUM62. A large number of scoring matrices are distributed with WU-BLAST in the matrix/aa directory. Nucleotide matrices for use with blastn are in matrix/nt.

 

N=[integer]

 

Default: -4 blastn

 

Sets the mismatch score. This parameter is usually used for blastn only but may be used for other programs.

 

See also

M

nogap

 

Default: Off

 

Turns off gapped alignment. This parameter is useful in conjunction with altscore to prevent stop codons.

 

See also

altscore

nonnegok

 

Default: Off

 

Under Karlin-Altschul statistics, the expected score, must be negative. WU-BLAST normally exits with a fatal error if this isn't the case. Sometimes scoring schemes with positive expected scores are useful, and setting nonnegok silences the error condition.

 

See also

novalidctxokerrors

nosegs

 

Default: Off

 

WU-BLAST doesn't allow alignments to cross hyphen characters that act as query segment boundaries (e.g., for draft sequence). nosegs effectively converts hyphens to Ns.

 

notes

 

Default: Off

 

Suppresses informational messages. For example, if you are intentionally searching for a low-complexity sequence, you may wish to disable the message that suggests that a low-complexity filter would help remove meaningless alignments.

 

See also

errorswarnings

novalidctxok

 

Default: Off

 

If a sequence can't generate any significant HSPs, WU-BLAST normally exits with an error that says there are no valid contexts. You may see encounter such an error when searching a collection of sequencing reads, some of which are mostly (or completely) Ns. Setting novalidctxok allows you to continue without error.

 

See also

nonnegokerrors

nwlen=[integer]

 

Default: End of sequence

 

Sets the length of region for seeding.

 

See also

nwstart

nwstart=[integer]

 

Default: 1

 

Sets the starting position for seeding alignments. nwstart and nwlen indicate that a specific region of the query should be seeded. Alignments may extend outside of this region. For example, nwstart=500 nwlen=200 seeds positions 500 to 700 of the query sequence.

 

See also

nwlen

o=[file]

 

Default: stdout

 

Write results to this file instead of to stdout (the screen).

 

olf=[number]

 

Default: 0.125

 

Maximum fractional length of overlap for alignment consistency.

Consistent alignments must be ordered and have minimal overlap (see Chapter 5). The amount of permitted overlap is expressed as both a relative fraction and an absolute number. The default setting, 0.1, prevents alignments whose overlap length is more than 10 percent of the length of either alignment from being in the same group. The golf parameter plays the same role for gapped alignments. The olmax and golmax parameters control the absolute length of the overlap.

 

olmax=[integer]

 

Default: Unlimited

 

Maximum absolute length of overlap for alignment consistency. See the description for olf.

 

postsw

 

Default: Off

Programs: blastp

Performs Smith-Waterman alignment after initial BLAST alignment to return the single maximum-scoring pair rather than several high-scoring pairs.

 

Q=[integer]

 

Default: 10 blastn, 9 blastp, blastx, tblastn, tblastx

 

Sets the cost for the first gap character.

 

See also

R

qoffset=[integer]

 

Default: 0

 

Adjusts the query numbering by this amount—for example, if you search with a sequence that was known to have a vector sequence in the first 25 bases. By setting this parameter to 25, your numbering will be based on the insert sequence.

 

qrecmax=[integer]

 

Default: 1

 

Last query sequence to search. See the description for qrecmin.

 

Qrecmin=[integer]

 

Default: 1

 

By default, WU-BLAST produces one BLAST report for each query sequence in a FASTA files with multiple sequences. Setting qrecmin and qrecmax allows you to select a subset of query sequences in much the same way as dbrecmin and dbrecmax.

 

See also

qrecmaxdbrecmindbrecmax

R=[integer]

 

Default: 10 blastn, 2 blastp, blastx, tblastn, tblastx

 

Sets the cost for the second and remaining gap characters.

 

See also

Q

restest

 

Default: Off

 

blastp and blastx statistical tests are based on the number of residues (letters) in the database. If Z is set in conjunction with restest, blastntblastn, and tblastx will also be based on the number of letters.

 

See also

seqtestZ

S=[integer]

 

Default: Variable; calculated from E

 

Sets the final score threshold. Since S and E are interconvertible through the Karlin-Altschul equation, setting S effectively sets E, and vice versa. When both are set, the more restrictive one is used.

 

See also

E

mS2=[integer]

 

Default: Variable; depends on scoring parameters

 

Score threshold for individual ungapped alignments. If both S2 and E2 are set, the more restrictive one is used.

 

See also

E2gapS2gapE2

seqtest

 

   

blastntblastn, and tblastx statistical tests are based on the number of sequences in the database. If Z is set in conjunction with seqtest, blastp and blastx will also be based on the number of sequences.

 

See also

restestZ

span, span1, span2

 

Default: span2

 

WU-BLAST normally discards HSPs that are contained completely within a larger, higher-scoring HSP. This behavior is called span2. If span1 is set, alignments are thrown out if they are subsets of the query or subject (unlike span2, both conditions aren't required). This is useful if the sequences contain many repeats. To prevent discarded alignments, set span. The output may become very large.

 

T=[integer]

 

Default: 11 blastp, 12 blastx, 13 tblastn, 13 tblastx

 

Sets the neighborhood word threshold score. Setting this value extremely high removes neighborhood words and makes seeding require matching words. T, W, and hitdist are the most effective parameters for controlling the sensitivity and speed of BLAST searches.

 

See also

Whitdist

top

 

Default: Off

Programs: blastn, tblastx, blastx

Searches only the top strand of the query.

 

See also

bottom

topcomboN=[integer]

 

Default: Off

 

Reports the number of consistent, or collinear, HSP combinations.

 

V=[integer]

 

Default: 500

 

Controls the number of one-line summaries.

 

See also

B

warnings

 

Default: Off

 

WU-BLAST reports various warning conditions. This parameter turns them off.

 

See also

noteserrors

wink=[integer]

 

Default: 1

 

Words are created by sliding a window of width W by wink letters at a time. If W equals wink, words don't overlap.

 

See also

WThitdist

wordmask=[method]

 

Default: Off

 

Filters the query sequence for seeding only. Low-complexity region in the query sequence isn't used in the initial word search but is available for alignment during the extension stage; called soft masking.

 

See also

filterlcfilterlcmaskechofiltermaskextra

W=[integer]

 

Default: 11

 

Sets the word size for seeding alignments.

 

See also

Thitdistwink

X=[integer]

 

Default: Variable; depends on scoring parameters

 

Controls the alignment extension cutoff for ungapped alignments.

 

See also

gapX

Y=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the size of the query sequence.

 

See also

Z

Z=[number]

 

Default: Variable; depends on scoring parameters

 

Sets the size of the database in letters (restest is assumed), but Z may also be used to mean the number of sequences if seqtest is set.

 

See also

Yseqtestrestest

14.4 xdformat Parameters

xdformat formats BLAST databases from FASTA files. It also reports descriptive information about the database and dumps the entire content to FASTA format.

Here are some examples:

xdformat -n files

xdformat -p files

zcat fasta.*.gz | xdformat -o my_db -n  --  -

xdformat -n -i database

xdformat -n -r datatbase > fasta_file

-A [0..2]

 

Default: 2

 

When indexing accession.version identifiers, you have three indexing options:

0

Accession only; version isn't stored

1

Stored as accession.version

2

Stored as both accession only and accession.version

 

-a [database]

 

   

Appends sequences to the named database. If the database is indexed, the appended sequences will also be indexed.

 

-c [character]

 

Default: Off

 

If an invalid letter is encountered, xdformat terminates and reports an error message. If this occurs, check the sequence file for errors. After checking, you may either skip illegal characters with -k or change them to a legal character with -c. The typical operation for nucleotides is to set -c N, and for proteins -c X.

 

See also

-k

-D [integer]

 

Default: Unlimited

 

Sets the maximum length for definition lines.

 

-d [string]

 

Default: None

 

Sets a user-defined release date for the database. The date may have 63 characters at most.

 

See also

-v

-e [file]

 

Default: stderr

 

Appends information and errors to the named file.

 

-G

 

Default: Off

 

Prefaces each sequence with the database record number in the format of gnl|xdf|#.

 

-i

 

Default: Off

 

Reports descriptive information about a BLAST database. This is useful for determining when a database was created, how many sequences it contains, and if it is indexed.

 

-K [integer]

 

Default: Unlimited

 

Sets the maximum number of identifiers with Control-A separators. This is useful for trimming highly redundant sequences created with nrdb or another redundancy purifier that uses Control-A separators.

 

-k

 

Default: Off

 

If an invalid letter is encountered, xdformat terminates. If this occurs, you can either skip illegal characters with -k or change them to a legal letter with -c. Check the errors to ensure the input file is formatted properly.

 

See also

-c

-L [number]

 

Default: 100000000 (100 million letters)

 

Sets the maximum sequence length. For optimal performance, break up large sequences into smaller fragments no larger than 1 million letters.

 

-l [number]

 

Default: 0

 

Sets the minimum sequence length.

 

-M [number]

 

Default: 96m

 

Sets the cache size for indexing. For faster indexing, the size may be increased (for example, -M 512m).

 

-O [4..8]

 

Default: 4

 

Sets the number of bytes of precision. The default value allows databases of up to 4 billion amino acids or 16 billion nucleotides. If you expect a database to contain more than this limit, increasing precision by one level multiplies the limit by 256. Setting -O is necessary only if you append to the database because the precision automatically increases appropriately when databases are created.

 

-P [integer]

 

Default: 60

 

This option applies only when dumping the entire content of a database with -r. -P controls the length of the sequence lines; -P 0 puts the whole sequence on one line.

 

See also

-r

-q [0..3]

 

Default: 0

 

Certain files may contain numerous nonfatal errors in their identifier format. -q quiets these errors.

0

No silencing

1

Silences field1 errors

2

Silences field 2 errors

3

Silences all fields

 

-r

 

Default: Off

 

Reports (dumps) the entire database content to stdout in FASTA format.

 

-T [string]

 

Default: Off

 

This option lets you restrict indexing of identifiers to a particular database name or tag. The [string] has two parts: part 1 is the name of the database (e.g., gb for GenBank or emb for EMBL—see Chapter 10), and part 2 is either blank or a number.

blank

Index all identifiers.

0

Don't index.

1

Index only field 1.

2

Index only field 2.

Here are some examples:

-T emb0 doesn't index EMBL records.

-T gb1 indexes GenBank accession but not locus.

-T gb2 indexes GenBank locus but not accession.

-T gb index both accession and locus of GenBank records.

 

-v

 

Default: Off

 

Sets a user-defined version string for the database (a maximum of 63 characters).

 

See also

-d

-X

 

Default: Off

 

Databases that are formatted but not indexed may be indexed or re-indexed (e.g., with a different indexing scheme) with -X. In the following examples, the two commands on Line 1 are equivalent to the one on Line 2.

xdformat -n nt_db ; xdformat -n -X nt_db

xdformat -n -I nt_db

 

14.5 xdget Parameters

xdget retrieves files in FASTA format from databases formatted with xdformat (not formatdbpressdb, or setdb). The database must have been indexed prior to using xdget (see -Iand -X in the previous section Section 14.4").

Here are a few example command lines. If identifiers contain vertical bars, as in the second example, you have to enclose the string in quotes to prevent the shell form interpreting them as pipes. This isn't required for identifier files.

xdget -n db 12345

xdget -p nr 'gi|11611819|gb|AAG39070.1|'

xdget -n -f db files_of_ids

-A [n, 0]

 

Default: n

 

Given an accession number without a version, xdget retrieves the latest version number. This parameter is set explicitly with -A n. If -A 0 is set, the earliest version number is retrieved.

 

See also

-d-N

-a [integer]

 

Default: 1

 

The -a and -b parameters retrieve a subsequence. For example, if you want to retrieve just nucleotides 1 to 100, include -a 1 -b 100. For nucleotide sequences, if -b is greater than -a, the sequence is returned as its reverse-complement.

 

See also

-b-r-t

-b [integer]

 

Default: 0, end of sequence

 

See -a above.

 

-d

 

Default: Off

 

Ordinarily, when duplicate identifiers are present, only one is retrieved. With -d, all duplicates are reported. Having duplicate identifiers is generally not a good idea.

 

See also

-A-N

-D [integer]

 

Default: Unlimited

 

Sets the maximum definition line length. Using definition lines to store arbitrary sequence data is common. This option is useful when you don't need the whole definition line.

 

-e [file]

 

Default: stderr

 

Appends messages and errors to log file.

 

-F

 

Default: Off

 

Flushes the output stream after each request. This is useful for preventing I/O deadlocks between communicating processes.

 

-f

 

Default: Off

 

Indicates that files of identifiers are given on the command line. The file format is one identifier per line.

 

-G

 

Default: Off

 

Prefaces each definition line with its record number using the gnl namespace. The format is gnl|xdf|#.

 

-o [file]

 

Default: stdout

 

Reports FASTA files to the named file rather than stdout.

 

-N [0, n]

 

Default: 0

 

For sequences with duplicate identifiers, the first one is retrieved by default. It is set explicitly with -N 0. Setting -N n retrieves the last one. Accession numbers with version numbers have different rules.

 

See also

-A-d

-P [integer]

 

Default: 60

 

Sets the maximum line length for sequence data. Setting -P 0 puts the entire sequence on one line.

 

-r

 

Default: Off

 

Returns the reverse complement for nucleotide sequences.

 

-T [string]

 

Default: Off

 

This option lets you restrict the lookup of identifiers to a particular database name or tag. For example, to look only in GenBank sequences, use -T gb. For only local, use -T lcl. For tags with multiple identifiers, a numeric suffix identifies which one to select. For example, -T gb1 selects accessions and -T gb2 selects loci. To prevent lookups in a database name, use zero. For example, -T gb0 omits GenBank records.

 

-t

 

Default: Off

 

Translates nt seq.