登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

云之南

风声,雨声,读书声,声声入耳;家事,国事,天下事,事事关心

 
 
 

日志

 
 
关于我

专业背景:计算机科学 研究方向与兴趣: JavaEE-Web软件开发, 生物信息学, 数据挖掘与机器学习, 智能信息系统 目前工作: 基因组, 转录组, NGS高通量数据分析, 生物数据挖掘, 植物系统发育和比较进化基因组学

blastclust 聚类  

2011-05-03 15:30:36|  分类: 生物信息学 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

http://szypanther.blog.hexun.com/44600967_d.html

聚类
    blastclust
blastclust -a 4 -i proteins.fsa -o cluster_60_80_complete.ssv -S 60 -L 0.80 -e F
use cpus: 4 
inputfile: proteins.fsa 
outputfile: cluster_60_80_complete.ssv  
protein identity: >60% 
coverage: >80%
if blastclust -a 4 -i proteins.fsa -o cluster_60_80_complete.ssv -S 60 -L 0.80 -e F -p F
then the input file is nucleotides, not proteins

 

blastclust Parameters
2008-12-02 20:02


blastclust clusters a database of protein or nucleotide sequences. It outputs rows of sequence identifiers from the database with clustered sequences occurring on the same row and clusters sorted from largest to smallest. The program can generate a list of clusters for input into another program (e.g., an alignment program such as PHRAP); however, it should be used only on a relatively small number of sequences (10-1000) because it runs only on a single computer, and the RAM requirements quickly exceed most capacities.

Here are a few sample command lines:

blastclust -i my_nucdb -p F -o my_nucdb.clusters 
blastclust -i my_pepdb -o my_pepdb.clusters -L 0.7 -S 90

The following reference describes parameters used with blastclust.

-a [integer]

Default: 1 Programs: All

Specifies the number of CPUs to use on a multiprocessor machine.

-b [T/F]

Default: T

Requires coverage on both sequences. If set to T, the program requires both sequences to pass the coverage criteria set with -L before they are called neighbors and clustered together.

-c [file]

Default: Optional

Specifies a configuration file with advanced options. The configuration file is simply a list of the options that you commonly use.

-C [T/F]

Default: F

The crash recovery option. Set it to complete unfinished clustering. Set to T if using the -r option with a file to restore the clustering. Use the same command line as the crashed run with the same -s, with only -C, T, and -r being added. This restarts the run using the hit list file specified by -r and then appending to it (as specified by -s).

-d [file]

Default: Optional

The input file is a BLAST database, not a FASTA file.

-e [T/F]

Default: F

Enables ID parsing in the database-formatted report.

-i [file]

Default: stdin

Specifies the FASTA input file for clustering.

-l [file]

Default: Optional

Restricts the reclustering to the IDs specified in [file]. It can be useful when you have a very large FASTA database and wish to cluster a subset of sequences.

-L [real number]

Default:0.9

Specifies the length of coverage threshold.

-p [T/F]

Default: T

Input sequences are proteins. Set to F for nucleotides.

-r [file]

Default: Optional

Specifies the file used to restore neighbors for reclustering. Set -C to T. This file is created by the -s command of a previous run. Use it if the program crashes during a run.

-s [file]

Default: Optional

Specifies the file in which to save the hit list. This file can restore a crashed run and is the input file specified by -r.

-v [file]

Default: stdout

Prints progress messages. Progress is reported to standard output if no file is specified.

-W [integer]

Default: Protein 3, Nucleotide 32

  评论这张
 
阅读(2698)| 评论(0)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018