http://hi.baidu.com/lidaof/blog/item/09bf863d02e9560cbaa167de.html blastdbcmd相当于以前的fastacmd,用来从格式化好的blast数据库中取序列 我从ncbi下载了refseq_rna的库,解压后,想看看库的信息,那么: blastdbcmd -db refseq_rna -info Database: NCBI Transcript Reference Sequences 2,120,610 sequences; 3,361,629,869 total bases Date: Apr 27, 2010 8:31 PM Longest sequence: 101,674 bases Volumes: /home/lidaof/db/refseq_rna 我想查询gi号为224071016的序列,那么: blastdbcmd -db refseq_rna -entry 224071016 >gi|224071016|ref|XM_002303295.1| Populus trichocarpa predicted protein, mRNA GAGTCGCTCCACAAAGTCTTGAACAAGGAATTGAAAAATGGGCAATCTGTTATTGACATTCTTGAGATCAACAGACTGCG GAGACAGTTACTATTCCAGTCTTACATGTGGGACAACCGCCTGGTCTATGCAGCCAGTTTAGATAACAACAGCTTCCATG ATGGTTCAAACAGCTCAACTTCAGGACAGGAGGTGAAACCACTAGGGCCAGCTAATAGTGATAAGCTCATTGAGGAAAAT GTTGATGCCAAGCTGCTTAAAGCCTCTAACCAGCAAGGAGGCTTTGGTAGCAACACAAACCAATGTGATGCAGTTGGTCA AGAAATAGATGTTTGTCAAGGTCCCAGTCATGGAAAAGGAGGCCAAGCTAATCCTTTTGCTGCTATGCCTGCCCGTGATC TATCTGACATTAAGGAATCTGGTGGAAATTTTTTTAGGACCCTTTCTGATGGACAGGATCCTGTCATGGCAAATCTATCA GATACCCTTGATGCTGCATGGACAGGTGAGAATCAACCTGGAAGTGGGACATTTAAGGATGATAATAGTAGGCTTTCTGA TTCAGCTATGGAAGAGTCTTCAACCACAGCTGTAGGGTTGGAGGGGGTAGGTTTGGAGGGCCATGTCGAAGACCAAGTTG GATCCAAAGTGTGCTATTCTCCTTCACCTGCATTGTCTACCAAGGACCCTGATAACATGGAAGATTCTATGAGCTGGCTA AGAATGCCCTTCTTGAATTTCTATCGTTCGTTCAACAATAATTGTTTAACAAGCTCTGAGAAGCTTGATAGTCTGAGGGA GTATAACCCTGTCTATATTTCATCCTTTAGGAAGTTAAAACTCCAAGATCAGGCTAGGCTGCTTCTGCCTGTGGGTGTGA ATGACACGGTCATTCCTGTATACGACGATGAACCCACAAGTCTTATATCTTATGCTTTAGTATCGCAAGAATATCATGCC CAGCTAACTGATGAGGGGGAAAGAGTAAAAGAATCTGGAGAATTCAGTCCATTCTCAAGTTTATCTGATACGATGTTCCA CTCTTTTGATGAAACAAGTTTTGATTCTTATAGAAGTTTTGGATCTACAGATGAGAGCATCTTATCCATGTCTGGATCAC GTGGCTCTTTGATTTTGGACCCACTCTCCTACACAAAGGCTTTGCATGCCAGAGTTTCTTTTGGAGATGACAGCCCAGTT GGTAAGGCAAGATATTCCGTGACATGCTACTATGCAAAACGGTTTGAAGCCTTAAGGAGGATATGTTGTCCATCTGAACT TGATTATATAAGGTCTCTTAGTCGTTGTAAGAAGTGGGGAGCTCAAGGTGGCAAGAGCAATGTCTTCTTTGCAAAAACCT TGGATGATCGCTTTATCATCAAACAAGTCACAAAAACAGAATTGGAGTCGTTTATAAAATTTGCTCCTGCTTACTTCAAG TATCTCTCTGAGTCAATTAGCTCAAGAAGTCCAACATGCCTGGCAAAGATTTTGGGAATTTATCAGGTTACATCGAAGCT TCTGAAAGGTGGGAAAGAAACGAAGATGGACGTTCTAGTTATGGAAAACCTTCTATTTAGGAGGAAAGTGACCCGCCTTT ATGATCTTAAAGGATCTTCCCGGTCACGGTATAATTCGGATTCTAGTGGGAGCAACAAGGTTCTGCTGGATCAGAACTTG ATTGAAGCAATGCCGACCTCTCCCATTTTTGTGGGAAACAAGGCAAAGCGGCTGCTGGAAAGAGCTGTCTGGAATGACAC TTCTTTTCTTGCATCGATTGATGTAATGGATTACTCATTATTGGTTGGGGTGGATGAAGAGAAGCACGAGTTAGTACTTG GGATAATTGATTTCATGAGGCAGTATACATGGGACAAGCATTTGGAAACATGGGTCAAGGCTTCAGGCATACTTGGCGGT CCAAAGAATGCTTCACCAACTGTTATTTCTCCGAAGCAATATAAGAAGAGGTTCAGGAAAGCGATGACGACCTATTTTCT GATGGTCCCAGATCAATGGTCCCCTCCCACTATCATTCTAAGTAAATCCCAATCTGATTTTGGCGAAGAGAACACACAAG GTGCGACTTCAGTTGACTGATATTGTGGGTCCGTGTTCTTGTACATGTAAACTTGAATTTTGGGATCTTCCCACAATTTT TCTCTCATTCTTAATTTTTCCTTTCATTTTTTATTTTTTATTTTTGTTTTATAGAAATTACTACTGTAACTTTAGTTAAG AAGAGAAGCTTATAATTATTTGTTAGGAAATGCAGAACAAGGCTGTCATAGCCATGAGATTCGGTTGGGGGTATAATATT GGATGACCTT 呵呵,是不是挺方便呢 加-help参数显示blastdbcmd的详细帮助哦,如下: blastdbcmd -help USAGE blastdbcmd [-h] [-help] [-db dbname] [-dbtype molecule_type] [-entry sequence_identifier] [-entry_batch input_file] [-pig PIG] [-info] [-range numbers] [-strand strand] [-mask_sequence_with numbers] [-out output_file] [-outfmt format] [-target_only] [-get_dups] [-line_length number] [-ctrl_a] [-version] DESCRIPTION BLAST database client, version 2.2.23+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore other arguments -help Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments -version Print version number; ignore other arguments *** BLAST database options -db <String> BLAST database name Default = `nr' -dbtype <String, `guess', `nucl', `prot'> Molecule type stored in BLAST database Default = `guess' *** Retrieval options -entry <String> Comma-delimited search string(s) of sequence identifiers: e.g.: 555, AC147927, 'gnl|dbname|tag', or 'all' to select all sequences in the database * Incompatible with: entry_batch, pig, info -entry_batch <File_In> Input file for batch processing (Format: one entry per line) * Incompatible with: entry, pig, info -pig <Integer, >=0> PIG to retrieve * Incompatible with: entry, entry_batch, target_only, info -info Print BLAST database information * Incompatible with: entry, entry_batch, outfmt, strand, target_only, ctrl_a, get_dups, pig, range *** Sequence retrieval configuration options -range <String> Range of sequence to extract (Format: start-stop) * Incompatible with: info -strand <String, `minus', `plus'> Strand of nucleotide sequence to extract Default = `plus' * Incompatible with: info -mask_sequence_with <String> Produce lower-case masked FASTA using the algorithm IDs specified (Format: N,M,...) *** Output configuration options -out <File_Out> Output file name Default = `-' -outfmt <String> Output format, where the available format specifiers are: %f means sequence in FASTA format %s means sequence data (without defline) %a means accession %g means gi %o means ordinal id (OID) %t means sequence title %l means sequence length %T means taxid %L means common taxonomic name %S means scientific name %P means PIG %mX means sequence masking data, where X is an optional comma- separted list of integers to specify the algorithm ID(s) to diaplay (or all masks if absent or invalid specification). Masking data will be displayed as a series of 'N-M' values separated by ';' or the word 'none' if none are available. For every format except '%f', each line of output will correspond to a sequence. Default = `%f' * Incompatible with: info -target_only Definition line should contain target GI only * Incompatible with: pig, info, get_dups -get_dups Retrieve duplicate accessions * Incompatible with: info, target_only *** Output configuration options for FASTA format -line_length <Integer, >=1> Line length for output Default = `80' -ctrl_a Use Ctrl-A as the non-redundant defline separator * Incompatible with: info
|
评论