与之前的blast相比,新的blast+将blastn,blastx等合作与blastall命令分隔开来,对各个命令的参数定制更加方便 个人在使用blastn的过程中总结了一些自认为常用的参数,总结如下: blastn -db database_name -query input_file -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format format_string blastn -db database_name -query input_file -out output_file -evalue evalue -max_target_seqs num_sequences -num_threads int_value -outfmt format "7 qacc sacc evalue length pident" 例如: blastn -db plant_rna -query test.fa -out test.out -evalue 0.00001 -max_target_seqs 5 -num_threads 4 -outfmt format "7 qacc sacc evalue length pident" blastn:这个不用说了吧,核酸对核酸的比对 -db: 指定blast搜索用的数据库,详见上篇文章 -query:用来查询的输入序列,fasta格式 -out:输出结果文件 -evalue: 设置e值cutoff -max_target_seqs:设置最多的目标序列匹配数(以前我都用-b 5 -v 5,理解不对请指教) -num_threads:指定多少个cpu运行任务(依赖于你的系统,同于以前的-a参数) -outfmt format "7 qacc sacc evalue length pident" :这个是新BLAST+中最拉风的功能了,直接控制输出格式,不用再用parser啦, 7表示带注释行的tab格式的输出,可以自定义要输出哪些内容,用空格分格跟在7的后面,并把所有的输出控制用双引号括起来,其中qacc查询序列的acc,sacc表示目标序列的acc,evalue即是e值,length即是匹配的长度,pident即是序列相同的百分比,其他可用的特征(红色字体)如下: *** Formatting options -outfmt <String> alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1 10 = Comma-separated values Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame When not provided, the default value is: 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0' 调用blastn合作加-help参数可以打印出下面详细的帮助信息 blastn -help blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-off_diagonal_range int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.2.23+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore other arguments -help Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments -version Print version number; ignore other arguments *** Input query options -query <File_In> Input file name Default = `-' -query_loc <String> Location on the query sequence (Format: start-stop) -strand <String, `both', `minus', `plus'> Query strand(s) to search against database/subject Default = `both' *** General search options -task <String, Permissible values: 'blastn' 'blastn-short' 'dc-megablast' 'megablast' 'vecscreen' > Task to execute Default = `megablast' -db <String> BLAST database name * Incompatible with: subject, subject_loc -out <File_Out> Output file name Default = `-' -evalue <Real> Expectation value (E) threshold for saving hits Default = `10' -word_size <Integer, >=4> Word size for wordfinder algorithm (length of best perfect match) -gapopen <Integer> Cost to open a gap -gapextend <Integer> Cost to extend a gap -penalty <Integer, <=0> Penalty for a nucleotide mismatch -reward <Integer, >=0> Reward for a nucleotide match -use_index <Boolean> Use MegaBLAST database index -index_name <String> MegaBLAST database index name *** BLAST-2-Sequences options -subject <File_In> Subject sequence(s) to search * Incompatible with: db, gilist, negative_gilist, db_soft_mask -subject_loc <String> Location on the subject sequence (Format: start-stop) * Incompatible with: db, gilist, negative_gilist, db_soft_mask, remote *** Formatting options -outfmt <String> alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = XML Blast output, 6 = tabular, 7 = tabular with comment lines, 8 = Text ASN.1, 9 = Binary ASN.1 10 = Comma-separated values Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession sallacc means All subject accessions qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame When not provided, the default value is: 'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0' -show_gis Show NCBI GIs in deflines? -num_descriptions <Integer, >=0> Number of database sequences to show one-line descriptions for Default = `500' -num_alignments <Integer, >=0> Number of database sequences to show alignments for Default = `250' -html Produce HTML output? *** Query filtering options -dust <String> Filter query sequence with DUST (Format: 'yes', 'level window linker', or 'no' to disable) Default = `20 64 1' -filtering_db <String> BLAST database containing filtering elements (i.e.: repeats) -window_masker_taxid <Integer> Enable WindowMasker filtering using a Taxonomic ID -window_masker_db <String> Enable WindowMasker filtering using this repeats database. -soft_masking <Boolean> Apply filtering locations as soft masks Default = `true' -lcase_masking Use lower case filtering in query and subject sequence(s)? *** Restrict search or results -gilist <String> Restrict search of database to list of GI's * Incompatible with: negative_gilist, remote, subject, subject_loc -negative_gilist <String> Restrict search of database to everything except the listed GIs * Incompatible with: gilist, remote, subject, subject_loc -entrez_query <String> Restrict search with the given Entrez query * Requires: remote -db_soft_mask <Integer> Filtering algorithm ID to apply to the BLAST database as soft masking * Incompatible with: subject, subject_loc -perc_identity <Real, 0..100> Percent identity -culling_limit <Integer, >=0> If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit * Incompatible with: best_hit_overhang, best_hit_score_edge -best_hit_overhang <Real, (>=0 and =<0.5)> Best Hit algorithm overhang value (recommended value: 0.1) * Incompatible with: culling_limit -best_hit_score_edge <Real, (>=0 and =<0.5)> Best Hit algorithm score edge value (recommended value: 0.1) * Incompatible with: culling_limit -max_target_seqs <Integer, >=1> Maximum number of aligned sequences to keep *** Discontiguous MegaBLAST options -template_type <String, `coding', `coding_and_optimal', `optimal'> Discontiguous MegaBLAST template type * Requires: template_length -template_length <Integer, Permissible values: '16' '18' '21' > Discontiguous MegaBLAST template length * Requires: template_type *** Statistical options -dbsize <Int8> Effective length of the database -searchsp <Int8, >=0> Effective length of the search space *** Search strategy options -import_search_strategy <File_In> Search strategy to use * Incompatible with: export_search_strategy -export_search_strategy <File_Out> File name to record the search strategy used * Incompatible with: import_search_strategy *** Extension options -xdrop_ungap <Real> X-dropoff value (in bits) for ungapped extensions -xdrop_gap <Real> X-dropoff value (in bits) for preliminary gapped extensions -xdrop_gap_final <Real> X-dropoff value (in bits) for final gapped alignment -no_greedy Use non-greedy dynamic programming extension -min_raw_gapped_score <Integer> Minimum raw gapped score to keep an alignment in the preliminary gapped and traceback stages -ungapped Perform ungapped alignment only? -window_size <Integer, >=0> Multiple hits window size, use 0 to specify 1-hit algorithm -off_diagonal_range <Integer, >=0> Number of off-diagonals to search for the 2nd hit, use 0 to turn off Default = `0' *** Miscellaneous options -parse_deflines Should the query and subject defline(s) be parsed? -num_threads <Integer, >=1> Number of threads to use in the BLAST search Default = `1' * Incompatible with: remote -remote Execute search remotely? * Incompatible with: gilist, negative_gilist, subject_loc, num_threads
|
评论