注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

云之南

风声,雨声,读书声,声声入耳;家事,国事,天下事,事事关心

 
 
 

日志

 
 
关于我

专业背景:计算机科学 研究方向与兴趣: JavaEE-Web软件开发, 生物信息学, 数据挖掘与机器学习, 智能信息系统 目前工作: 基因组, 转录组, NGS高通量数据分析, 生物数据挖掘, 植物系统发育和比较进化基因组学

网易考拉推荐

BLAT  

2011-07-24 11:53:36|  分类: 生信分析软件 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

http://genome.ucsc.edu/FAQ/FAQblat.html
Running  the  Programs:

The command line options of each of the programs is described below. Similar summaries of usage are printed when a command is run with no arguments.  See the next section for info on installing webBlat. 

blat 

blat - Standalone BLAT sequence search command line tool

usage:

   blat database query [-ooc=11.ooc] output.psl

where:

   database is either a .fa file, a .nib file, or a list of .fa or .nib

   files, query is similarly a .fa, .nib, or list of .fa or .nib files

   -ooc=11.ooc tells the program to load over-occurring 11-mers from

               and external file.  This will increase the speed

               by a factor of 40 in many cases, but is not required

   output.psl is where to put the output.

options:

   -t=type     Database type.  Type is one of:

                 dna - DNA sequence

                prot - protein sequence

                 dnax - DNA sequence translated in six frames to protein

               The default is dna

   -q=type     Query type.  Type is one of:

                 dna - DNA sequence

                 rna - RNA sequence

                 prot - protein sequence

                 dnax - DNA sequence translated in six frames to protein

                 rnax - DNA sequence translated in three frames to protein

               The default is dna

   -prot       Synonymous with -d=prot -q=prot

   -ooc=N.ooc  Use overused tile file N.ooc.  N should correspond to

               the tileSize

   -tileSize=N sets the size of match that triggers an alignment.

               Usually between 8 and 12

               Default is 11 for DNA and 5 for protein.

   -oneOff=N   If set to 1 this allows one mismatch in tile and still

               triggers an alignments.  Default is 0.

 

   -minMatch=N sets the number of tile matches.  Usually set from 2 to 4

               Default is 2 for nucleotide, 1 for protein.

   -minScore=N sets minimum score.  This is twice the matches minus the

               mismatches minus some sort of gap penalty.  Default is 30

   -minIdentity=N Sets minimum sequence identity (in percent).  Default is

               90 for nucleotide searches, 25 for protein or translated

               protein searches.

   -maxGap=N   sets the size of maximum gap between tiles in a clump.  Usually

               set from 0 to 3.  Default is 2. Only relevent for minMatch > 1.

   -noHead     suppress .psl header (so it's just a tab-separated file)

   -makeOoc=N.ooc Make overused tile file

   -repMatch=N sets the number of repetitions of a tile allowed before

               it is marked as overused.  Typically this is 256 for tileSize

               12, 1024 for tile size 11, 4096 for tile size 10.

               Default is 1024.  Typically only comes into play with makeOoc

   -mask=type  Mask out repeats.  Alignments won't be started in masked region

               but may extend through it in nucleotide searches.  Masked areas

               are ignored entirely in protein or translated searches. Types are

                 lower - mask out lower cased sequence

                 upper - mask out upper cased sequence

                 out   - mask according to database.out RepeatMasker .out file

                 file.out - mask database according to RepeatMasker file.out

   -qMask=type Mask out repeats in query sequence.  Similar to -mask above but

               for query rather than target sequence.

   -minRepDivergence=NN - minimum percent divergence of repeats to allow

               them to be unmasked.  Default is 15.  Only relevant for

               masking using RepeatMasker .out files.

   -dots=N     Output dot every N sequences to show program's progress

   -trimT      Trim leading poly-T

   -noTrimA    Don't trim trailing poly-A

   -trimHardA  Remove poly-A tail from qSize as well as alignments in psl output

   -out=type   Controls output file format.  Type is one of:

                   psl - Default.  Tab separated format without actual sequence

                   pslx - Tab separated format with sequence

                   axt - blastz-associated axt format

                   maf - multiz-associated maf format

                   wublast - similar to wublast format

                   blast - similar to NCBI blast format

   -fine       For high quality mRNAs look harder for small initial and

               terminal exons.  Not recommended for ESTs 

Here are some blat settings for common usage scenarios: 

1) Mapping ESTs to the genome within the same species

    -ooc=11.ooc

2) Mapping full length mRNAs to the genome in the same species

    -ooc=11.ooc -fine -q=rna

3) Mapping ESTs to the genome across species

    -q=dnax -t=dnax

4) Mapping mRNA to the genome across species

    -q=rnax -t=dnax

5) Mapping proteins to the genome

    -q=prot -t=dnax

6) Mapping DNA to DNA in the same species

    -ooc=11.ooc -fastMap

7) Mapping DNA from one species to another species

    -q=dnax -t=dnax

    When mapping DNA from one species to another the

    query side of the alignment should be cut up into chunks

    of 25kb or less for best performance. 

  评论这张
 
阅读(1500)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016