登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

云之南

风声,雨声,读书声,声声入耳;家事,国事,天下事,事事关心

 
 
 

日志

 
 
关于我

专业背景:计算机科学 研究方向与兴趣: JavaEE-Web软件开发, 生物信息学, 数据挖掘与机器学习, 智能信息系统 目前工作: 基因组, 转录组, NGS高通量数据分析, 生物数据挖掘, 植物系统发育和比较进化基因组学

BLAST+中blastdbcmd小用  

2010-07-19 15:59:15|  分类: 生物信息学 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |
BLAST+中blastdbcmd小用
2010-04-29 18:41
http://hi.baidu.com/lidaof/blog/item/09bf863d02e9560cbaa167de.html

blastdbcmd相当于以前的fastacmd,用来从格式化好的blast数据库中取序列

我从ncbi下载了refseq_rna的库,解压后,想看看库的信息,那么:

blastdbcmd -db refseq_rna -info
Database: NCBI Transcript Reference Sequences
        2,120,610 sequences; 3,361,629,869 total bases

Date: Apr 27, 2010 8:31 PM     Longest sequence: 101,674 bases

Volumes:
        /home/lidaof/db/refseq_rna

我想查询gi号为224071016的序列,那么:

blastdbcmd -db refseq_rna -entry 224071016
>gi|224071016|ref|XM_002303295.1| Populus trichocarpa predicted protein, mRNA
GAGTCGCTCCACAAAGTCTTGAACAAGGAATTGAAAAATGGGCAATCTGTTATTGACATTCTTGAGATCAACAGACTGCG
GAGACAGTTACTATTCCAGTCTTACATGTGGGACAACCGCCTGGTCTATGCAGCCAGTTTAGATAACAACAGCTTCCATG
ATGGTTCAAACAGCTCAACTTCAGGACAGGAGGTGAAACCACTAGGGCCAGCTAATAGTGATAAGCTCATTGAGGAAAAT
GTTGATGCCAAGCTGCTTAAAGCCTCTAACCAGCAAGGAGGCTTTGGTAGCAACACAAACCAATGTGATGCAGTTGGTCA
AGAAATAGATGTTTGTCAAGGTCCCAGTCATGGAAAAGGAGGCCAAGCTAATCCTTTTGCTGCTATGCCTGCCCGTGATC
TATCTGACATTAAGGAATCTGGTGGAAATTTTTTTAGGACCCTTTCTGATGGACAGGATCCTGTCATGGCAAATCTATCA
GATACCCTTGATGCTGCATGGACAGGTGAGAATCAACCTGGAAGTGGGACATTTAAGGATGATAATAGTAGGCTTTCTGA
TTCAGCTATGGAAGAGTCTTCAACCACAGCTGTAGGGTTGGAGGGGGTAGGTTTGGAGGGCCATGTCGAAGACCAAGTTG
GATCCAAAGTGTGCTATTCTCCTTCACCTGCATTGTCTACCAAGGACCCTGATAACATGGAAGATTCTATGAGCTGGCTA
AGAATGCCCTTCTTGAATTTCTATCGTTCGTTCAACAATAATTGTTTAACAAGCTCTGAGAAGCTTGATAGTCTGAGGGA
GTATAACCCTGTCTATATTTCATCCTTTAGGAAGTTAAAACTCCAAGATCAGGCTAGGCTGCTTCTGCCTGTGGGTGTGA
ATGACACGGTCATTCCTGTATACGACGATGAACCCACAAGTCTTATATCTTATGCTTTAGTATCGCAAGAATATCATGCC
CAGCTAACTGATGAGGGGGAAAGAGTAAAAGAATCTGGAGAATTCAGTCCATTCTCAAGTTTATCTGATACGATGTTCCA
CTCTTTTGATGAAACAAGTTTTGATTCTTATAGAAGTTTTGGATCTACAGATGAGAGCATCTTATCCATGTCTGGATCAC
GTGGCTCTTTGATTTTGGACCCACTCTCCTACACAAAGGCTTTGCATGCCAGAGTTTCTTTTGGAGATGACAGCCCAGTT
GGTAAGGCAAGATATTCCGTGACATGCTACTATGCAAAACGGTTTGAAGCCTTAAGGAGGATATGTTGTCCATCTGAACT
TGATTATATAAGGTCTCTTAGTCGTTGTAAGAAGTGGGGAGCTCAAGGTGGCAAGAGCAATGTCTTCTTTGCAAAAACCT
TGGATGATCGCTTTATCATCAAACAAGTCACAAAAACAGAATTGGAGTCGTTTATAAAATTTGCTCCTGCTTACTTCAAG
TATCTCTCTGAGTCAATTAGCTCAAGAAGTCCAACATGCCTGGCAAAGATTTTGGGAATTTATCAGGTTACATCGAAGCT
TCTGAAAGGTGGGAAAGAAACGAAGATGGACGTTCTAGTTATGGAAAACCTTCTATTTAGGAGGAAAGTGACCCGCCTTT
ATGATCTTAAAGGATCTTCCCGGTCACGGTATAATTCGGATTCTAGTGGGAGCAACAAGGTTCTGCTGGATCAGAACTTG
ATTGAAGCAATGCCGACCTCTCCCATTTTTGTGGGAAACAAGGCAAAGCGGCTGCTGGAAAGAGCTGTCTGGAATGACAC
TTCTTTTCTTGCATCGATTGATGTAATGGATTACTCATTATTGGTTGGGGTGGATGAAGAGAAGCACGAGTTAGTACTTG
GGATAATTGATTTCATGAGGCAGTATACATGGGACAAGCATTTGGAAACATGGGTCAAGGCTTCAGGCATACTTGGCGGT
CCAAAGAATGCTTCACCAACTGTTATTTCTCCGAAGCAATATAAGAAGAGGTTCAGGAAAGCGATGACGACCTATTTTCT
GATGGTCCCAGATCAATGGTCCCCTCCCACTATCATTCTAAGTAAATCCCAATCTGATTTTGGCGAAGAGAACACACAAG
GTGCGACTTCAGTTGACTGATATTGTGGGTCCGTGTTCTTGTACATGTAAACTTGAATTTTGGGATCTTCCCACAATTTT
TCTCTCATTCTTAATTTTTCCTTTCATTTTTTATTTTTTATTTTTGTTTTATAGAAATTACTACTGTAACTTTAGTTAAG
AAGAGAAGCTTATAATTATTTGTTAGGAAATGCAGAACAAGGCTGTCATAGCCATGAGATTCGGTTGGGGGTATAATATT
GGATGACCTT

呵呵,是不是挺方便呢

加-help参数显示blastdbcmd的详细帮助哦,如下:

blastdbcmd -help
USAGE
blastdbcmd [-h] [-help] [-db dbname] [-dbtype molecule_type]
    [-entry sequence_identifier] [-entry_batch input_file] [-pig PIG] [-info]
    [-range numbers] [-strand strand] [-mask_sequence_with numbers]
    [-out output_file] [-outfmt format] [-target_only] [-get_dups]
    [-line_length number] [-ctrl_a] [-version]

DESCRIPTION
   BLAST database client, version 2.2.23+

OPTIONAL ARGUMENTS
-h
   Print USAGE and DESCRIPTION; ignore other arguments
-help
   Print USAGE, DESCRIPTION and ARGUMENTS description; ignore other arguments
-version
   Print version number; ignore other arguments

*** BLAST database options
-db <String>
   BLAST database name
   Default = `nr'
-dbtype <String, `guess', `nucl', `prot'>
   Molecule type stored in BLAST database
   Default = `guess'

*** Retrieval options
-entry <String>
   Comma-delimited search string(s) of sequence identifiers:
        e.g.: 555, AC147927, 'gnl|dbname|tag', or 'all' to select all
        sequences in the database
    * Incompatible with: entry_batch, pig, info
-entry_batch <File_In>
   Input file for batch processing (Format: one entry per line)
    * Incompatible with: entry, pig, info
-pig <Integer, >=0>
   PIG to retrieve
    * Incompatible with: entry, entry_batch, target_only, info
-info
   Print BLAST database information
    * Incompatible with: entry, entry_batch, outfmt, strand, target_only,
   ctrl_a, get_dups, pig, range

*** Sequence retrieval configuration options
-range <String>
   Range of sequence to extract (Format: start-stop)
    * Incompatible with: info
-strand <String, `minus', `plus'>
   Strand of nucleotide sequence to extract
   Default = `plus'
    * Incompatible with: info
-mask_sequence_with <String>
   Produce lower-case masked FASTA using the algorithm IDs specified (Format:
   N,M,...)

*** Output configuration options
-out <File_Out>
   Output file name
   Default = `-'
-outfmt <String>
   Output format, where the available format specifiers are:
                %f means sequence in FASTA format
                %s means sequence data (without defline)
                %a means accession
                %g means gi
                %o means ordinal id (OID)
                %t means sequence title
                %l means sequence length
                %T means taxid
                %L means common taxonomic name
                %S means scientific name
                %P means PIG
                %mX means sequence masking data, where X is an optional comma-
                separted list of integers to specify the algorithm ID(s) to
                diaplay (or all masks if absent or invalid specification).
                Masking data will be displayed as a series of 'N-M' values
                separated by ';' or the word 'none' if none are available.
        For every format except '%f', each line of output will correspond to
        a sequence.
   Default = `%f'
    * Incompatible with: info
-target_only
   Definition line should contain target GI only
    * Incompatible with: pig, info, get_dups
-get_dups
   Retrieve duplicate accessions
    * Incompatible with: info, target_only

*** Output configuration options for FASTA format
-line_length <Integer, >=1>
   Line length for output
   Default = `80'
-ctrl_a
   Use Ctrl-A as the non-redundant defline separator
    * Incompatible with: info


  评论这张
 
阅读(5462)| 评论(2)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018