The output from FASTA is divided into four sections. First is some information on the query sequence and the database searched. Second is a histogram that shows graphically the score distribution. Third is a list of the sequences matched with some statistical information about the strength of the match, and finally, the alignments themselves are shown.
opt E() < 20 188 0:== 22 0 0: one = represents 109 library sequences 24 0 0: 26 2 1:* 28 7 15:* 30 28 91:* 32 200 353:== * 34 841 958:========* 36 2217 1968:==================*== 38 3746 3253:=============================*===== 40 5360 4538:=========================================*======== 42 6055 5547:==================================================*===== 44 6496 6119:========================================================*=== 46 5820 6232:====================================================== * 48 5469 5966:=================================================== * 50 4820 5444:============================================= * 52 4202 4787:======================================= * 54 3815 4089:=================================== * 56 3271 3415:===============================* 58 2755 2804:=========================* 60 2268 2271:====================* 62 1813 1821:================* 64 1500 1448:=============* 66 1233 1145:==========*= 68 951 900:========* 70 746 706:======* 72 699 551:=====*= 74 460 430:===*= 76 337 335:===* 78 287 260:==* 80 244 202:=*= 82 185 154:=* 84 115 122:=* 86 114 95:*= 88 75 73:* inset = represents 1 library sequences 90 70 57:* 92 48 44:* :=======================================* 94 26 34:* :========================== * 96 33 26:* :=========================*======= 98 14 20:* :============== * 100 10 16:* :========== * 102 7 12:* :======= * 104 6 9:* :====== * 106 5 7:* :===== * 108 2 6:* :== * 110 2 4:* :== * 112 1 3:* := * 114 0 3:* : * 116 0 2:* : * 118 0 2:* : * >120 27 1:* :*==========================FASTA works out an initial score for each of the database sequences matched against your sequence. The histogram gives a graphical representation of the distribution of these scores. It should be expected that these scores would fall approximately into a normal distribution, and that any significant matches will fall outside the normal curve. You can see that at the bottom of the histogram there are 27 sequences that fall outside the curve (represented by the asterisks).
The best scores are: initn init1 opt z-sc E(66345) MERR_PSEAE mercuric resistance operon regu ( 144) 928 928 928 1129.8 0 MERR_SHIFL mercuric resistance operon regu ( 144) 871 871 871 1061.3 0 MERR_SERMA mercuric resistance operon regu ( 144) 810 810 810 988.1 0 MERR_STAAU mercuric resistance operon regu ( 135) 292 172 298 373.6 3.5e-14 MERR_BACSR (strain rc607). mercuric resist ( 132) 241 198 289 363.0 1.4e-13 YHDM_ECOLI hypothetical transcriptional re ( 141) 175 175 276 347.0 1.1e-12The first part of the line gives the database name of the matched sequence, followed by the first part of the description. After this are three scores, initn, init1 and opt..
Some quick rules
The last two numbers are a statistical measure of the significance of
the match. The first (z-score) is a measure (in standard deviations) of how far
the score falls away from the mean. The second is an estimate of the likelihood
of a similar match occuring by chance. Obviously, the lower the second number,
the more unlikely it is that the match is random. Generally, a figure of 0.01 or
below is statistically very significant, and a figure of between 0.01 and 0.05
is borderline.
It is important to remember that a statistically
significant match is not necessarily biologically significant, and conversely a
match may be of biological significance without passing the statistical
test. You should therefore use your knowledge of the biology of the system
to help interpret these results.
>>MERR_STAAU mercuric resistance operon regulatory protei (135 aa)
initn: 292 init1: 172 opt: 298 Z-score: 373.6 expect() 3.5e-14
Smith-Waterman score: 298; 36.923% identity in 130 aa overlap
10 20 30 40 50 60
MerR MENNLENLTIGVFAKAAGVNVETIRFYQRKGLLLEPDKPYGSIRRYGEADVTRVRFVKSA
. :. .::: :: ::.:.:.::::. : . .. : :.: . ::::.:
MERR_S MGMKISELAKACDVNKETVRYYERKGLIAGPPRNESGYRIYSEETADRVRFIKRM
10 20 30 40 50
70 80 90 100 110
MerR QRLGFSLDEIAELLRL--EDGTHCEEASSLAEHKLKDVREKMADLARMEAVLSELVCACH
..: ::: :: :. . .:: .:.. ... .: :....:. : :.. .: :: :
MERR_S KELDFSLKEIHLLFGVVDQDGERCKDMYAFTVQKTKEIERKVQGLLRIQRLLEELKEKCP
60 70 80 90 100 110
120 130 140
MerR ARRGNVSCPLIASLQGGASLAGSAMP
... .::.: .:.::
MERR_S DEKAMYTCPIIETLMGGPDK
120 130
Above is an example of the alignments. The database name is given, along
with the one-line description of the database entry. Also given are some more
statistical data, including percentages of identical amino acids and the length
of the match. Below this are the alignments themselves. Your sequence is shown
at the top and the database sequence is given below. Identities are shown with
the : symbol, and similarities with the . symbol. Where FASTA has introduced
gaps to optimise the alignment, these are shown with -- symbols in the sequence.