BLAST (Basic Local Alignment Search Tool)

Appendix B. Nucleotide Scoring Schemes

Nucleotide scoring schemes are often summarized by their target frequency, which is the expected frequency of nucleotide pairs. This frequency is usually expressed as the expected percent identity. For example, the +1/-1 match/mismatch values have a target frequency of 75 percent identity. But this is true only for ungapped alignments between sequences of infinite length. Short sequences and gapped alignment change the true target frequency. In the following table, the target frequencies for a variety of match (+), mismatch (-), and simple gap costs (gap) are calculated for pairs of sequences of length 100, 500, and 1,000 by performing local alignments of random nucleotide sequences of unbiased composition. The theoretical target frequency (TF) is included for comparison.

+

-

Gap

TF

100

500

1,000

1

1

1

75

55

49

49

1

1

2

75

79

70

69

1

1

3

75

85

79

79

1

2

2

95

93

89

88

1

2

3

95

98

96

96

1

2

4

95

98

97

97

1

3

3

99

99

99

98

5

4

4

65

51

48

48

5

4

5

65

53

49

49

5

4

6

65

55

50

49

5

4

7

65

59

51

50

5

4

8

65

62

52

50

5

4

9

65

64

55

53

5

4

10

65

67

59

57

5

4

11

65

69

61

60

5

4

12

65

71

63

62

5

5

5

75

55

49

49

5

5

6

75

59

51

50

5

5

7

75

64

55

53

5

5

8

75

70

61

59

5

5

9

75

72

65

64

5

5

10

75

79

70

69

5

5

11

75

80

73

71

5

5

12

75

81

75

74

5

5

13

75

82

76

76

5

5

14

75

82

77

77

5

5

15

75

85

79

79

5

6

6

82

62

53

51

5

6

7

82

69

60

58

5

6

8

82

75

67

65

5

6

9

82

79

73

71

5

6

10

82

83

77

75

5

6

11

82

85

79

79

5

6

12

82

87

81

81

5

6

15

82

90

85

84

5

6

18

82

90

87

86

5

7

7

87

73

64

63

5

7

8

87

78

72

70

5

7

9

87

83

77

76

5

7

10

87

87

82

81

5

7

11

87

89

84

83

5

7

12

87

90

86

85

5

7

13

87

91

88

87

5

7

14

87

91

88

87

5

7

21

87

93

91

90

5

8

8

90

81

75

73

5

8

9

90

85

80

79

5

8

10

90

89

85

84

5

8

11

90

91

87

86

5

8

12

90

92

89

88

5

8

13

90

93

90

89

5

8

14

90

93

91

90

5

8

15

90

94

92

91

5

8

16

90

94

93

92

5

8

24

90

95

94

93

5

9

9

93

86

82

81

5

9

10

93

90

86

85

5

9

11

93

92

89

89

5

9

12

93

93

91

90

5

9

13

93

93

92

91

5

9

14

93

94

92

91

5

9

15

93

95

93

92

5

9

16

93

95

94

93

5

9

17

93

95

94

93

5

9

18

93

95

94

94

5

9

27

93

96

95

94

5

10

10

95

93

89

88

5

10

11

95

94

92

90

5

10

12

95

95

93

91

5

10

13

95

95

94

93

5

10

14

95

95

94

96

5

10

15

95

98

96

96

5

10

20

95

98

97

97

5

10

30

95

98

98

97