http://hi.baidu.com/lidaof/blog/item/9d29eade7f340a5895ee3724.html 项目要求,开始使用BLAST+ 我们从格式化数据库开始 1,我先获得了一个植物rRNA的序列文件,为fasta格式,用makeblastdb格式化之,如下 makeblastdb -in plant26.rna.fna -dbtype nucl -parse_seqids -hash_index -out plant_rna 注意:在BLAST+2.2.24中 不加参数 -parse_seqids ,-hash_index 不然成死循环 Building a new DB, current time: 04/29/2010 16:09:11 New DB name: plant_rna New DB title: plant26.rna.fna Sequence type: Nucleotide Keep Linkouts: T Keep MBits: T Maximum file size: 1073741824B Adding sequences from FASTA; added 6747 sequences in 0.546889 seconds. 此时目录下还是如往常一下生成如下文件 plant_rna.nhr plant_rna.nin plant_rna.nnd plant_rna.nni plant_rna.nsd plant_rna.nsi plant_rna.nsq 2,我们有如下一个fasta序列想做blast,文件如下 more test >gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probab le (VaoD) mRNA, complete cds ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGACTCACCAAACAGAACTACGC CGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACGTACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTG ATCTTCAAGCGCCAGTGAGACCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATCGATAATCTTGTGTTGGCTTT AAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTTGAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCG CGGTTGTCGTGTCGAGTAGTGTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATACAAGGAATATTTGCAAGATTT CATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAATTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACG CGATAAGAATCACACTGAACTCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGCATACTAGTAGCTTATCCAGA ATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATTGACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGAC AAGCACTCGATGAGCAATTCAATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTACATACAATAG 3,blast搜索命令格式如下: blastn -db plant_rna -query test -out test.blastn 4,我们来看看输入文件test.blastn的内容 BLASTN 2.2.23+ Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller (2000), "A greedy algorithm for aligning DNA sequences", J Comput Biol 2000; 7(1-2):203-14. Database: plant26.rna.fna 6,747 sequences; 8,392,753 total letters Query= gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds Length=1029 Score E Sequences producing significant alignments: (Bits) Value ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar AT... 1901 0.0 >ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probable (VaoD) mRNA, complete cds Length=1029 Score = 1901 bits (1029), Expect = 0.0 Identities = 1029/1029 (100%), Gaps = 0/1029 (0%) Strand=Plus/Plus Query 1 ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGA 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 1 ATGTTCAACGCGAAGAACGGTTTTTCTGAGGCACACGTGAGGGGATGTCAGACCAAACGA 60 Query 61 CTCACCAAACAGAACTACGCCGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACG 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 61 CTCACCAAACAGAACTACGCCGAACTTTCTCGATGTGACACGTTGGAAGACATCAAGACG 120 Query 121 TACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTGATCTTCAAGCGCCAGTGAGA 180 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 121 TACTTGCAAACGATGAGTGATTATTCAGAATATGTTCGTGATCTTCAAGCGCCAGTGAGA 180 Query 181 CCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT 240 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 181 CCGGTTGACATTATTGAATGCTGCAGAAAGAGACAGATCGCAGAGTTTAATATTTGCTGT 240 Query 241 CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATC 300 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 241 CAGCAGGCTTCTTCCCCTTTGTCCAATTTTTTGGAGTATTTGACGTACGGATACATGATC 300 Query 301 GATAATCTTGTGTTGGCTTTAAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTT 360 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 301 GATAATCTTGTGTTGGCTTTAAATGGCATGCTTCGTGGACGTACCACAGAGGCAATACTT 360 Query 361 GAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCGCGGTTGTCGTGTCGAGTAGT 420 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 361 GAGAAGTGTAGCCCCATTGGTTTTTTCGATTCTTTATCCGCGGTTGTCGTGTCGAGTAGT 420 Query 421 GTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC 480 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 421 GTCCAAGAACTCTACAGACTAGCTCTCGTGGATACACCGCTTGCCTCTTATTTCAGTAGC 480 Query 481 TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATAC 540 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 481 TCGATTAAGGCAGAAGATCTGGATGAGTTAAATATTGAGCTCATACGGAACGTCCTATAC 540 Query 541 AAGGAATATTTGCAAGATTTCATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAA 600 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 541 AAGGAATATTTGCAAGATTTCATGGTTTTCTGCAACAAAATGGATCAAAACACACGTCAA 600 Query 601 TTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACGCGATAAGAATCACACTGAAC 660 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 601 TTGATGGAGAAACTACTTAGCATGGAGGCCGATCGGCACGCGATAAGAATCACACTGAAC 660 Query 661 TCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC 720 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 661 TCTTTCGGAACAGAGCTTTCCAAGGCTGATCGAAGAAATCTTTATACGAATTTTGGCACC 720 Query 721 ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGC 780 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 721 ATGTACCCCGATGGCTTCGCGCGTCTTGCGAATTGTGAAACGGTAGATGAAGTGAAACGC 780 Query 781 ATACTAGTAGCTTATCCAGAATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATT 840 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 781 ATACTAGTAGCTTATCCAGAATTCAGAGAGTTGACGAAAAGTGATGATCCCCACTACATT 840 Query 841 GACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGACAAGCACTCGATGAGCAATTC 900 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 841 GACAGGGGACTACGCGTTCTCGAACTGGAAGCATGTGGACAAGCACTCGATGAGCAATTC 900 Query 901 AATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG 960 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 901 AATTTCGCTATCTTTTATGCTTTCGTAAAGTTTCAGGAGAACGAAATAAACAACCTGATG 960 Query 961 TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTAC 1020 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct 961 TGGCTCACTGAGTGTGTTGCTCAAAGGCAAAAAAGTAGTCTAGGCGAGGGCATTGTCTAC 1020 Query 1021 ATACAATAG 1029 ||||||||| Sbjct 1021 ATACAATAG 1029 Lambda K H 1.33 0.621 1.12 Gapped Lambda K H 1.28 0.460 0.850 Effective search space used: 8286997432 Database: plant26.rna.fna Posted date: Apr 29, 2010 4:09 PM Number of letters in database: 8,392,753 Number of sequences in database: 6,747 Matrix: blastn matrix 1 -2 Gap Penalties: Existence: 0, Extension: 0 5,我们可以看出,如果不加任何参数,和以前的输出差不多相同,大家还是可以使用以前写的bioperl或biopython parser进行结果的分析 6,下面我们看看增加输出参数呢,我们使用-outfmt 7作为控制输出结果,其中选择性输出部分内容,命令和输出如下: blastn -db plant_rna -query test -outfmt "7 qacc sacc evalue length pident" # BLASTN 2.2.23+ # Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds # Database: plant_rna # Fields: query acc., subject acc., evalue, alignment length, % identity # 1 hits found gi|145342129|ref|XM_001416109.1| XM_001416109 0.0 1029 100.00 # BLAST processed 1 queries 这个文件是自解释的,大家可以看得很清楚每项是什么意思,呵呵,其他输出的一些部分特性如下: blastn -db plant_rna -query test -outfmt "7 qacc sacc evalue length pident" # BLASTN 2.2.23+ # Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds # Database: plant_rna # Fields: query acc., subject acc., evalue, alignment length, % identity # 1 hits found gi|145342129|ref|XM_001416109.1| XM_001416109 0.0 1029 100.00 # BLAST processed 1 queries blastn -db plant_rna -query test -outfmt "7 qgi sgi evalue length pident" # BLASTN 2.2.23+ # Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds # Database: plant_rna # Fields: query gi, subject gi, evalue, alignment length, % identity # 1 hits found 0 145342129 0.0 1029 100.00 # BLAST processed 1 queries blastn -db plant_rna -query test -outfmt "7 qid sid evalue length pident" # BLASTN 2.2.23+ # Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds # Database: plant_rna # Fields: evalue, alignment length, % identity # 1 hits found 0.0 1029 100.00 # BLAST processed 1 queries blastn -db plant_rna -query test -outfmt "7 qseqid sseqid evalue length pident" # BLASTN 2.2.23+ # Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds # Database: plant_rna # Fields: query id, subject id, evalue, alignment length, % identity # 1 hits found gi|145342129|ref|XM_001416109.1| gi|145342129|ref|XM_001416109.1| 0.0 1029 100.00 # BLAST processed 1 queries blastn -db plant_rna -query test -outfmt "7 qframe sseqid evalue length pident" # BLASTN 2.2.23+ # Query: gi|145342129|ref|XM_001416109.1| Ostreococcus lucimarinus CCE9901 vacuolar ATP synthase subunit D, probale (VaoD) mRNA, complete cds # Database: plant_rna # Fields: query frame, subject id, evalue, alignment length, % identity # 1 hits found 1 gi|145342129|ref|XM_001416109.1| 0.0 1029 100.00
|
评论