登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

云之南

风声,雨声,读书声,声声入耳;家事,国事,天下事,事事关心

 
 
 

日志

 
 
关于我

专业背景:计算机科学 研究方向与兴趣: JavaEE-Web软件开发, 生物信息学, 数据挖掘与机器学习, 智能信息系统 目前工作: 基因组, 转录组, NGS高通量数据分析, 生物数据挖掘, 植物系统发育和比较进化基因组学

新一代测序技术(NGS)组装简介 三  

2010-01-01 11:17:48|  分类: 生物信息学 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

MIRA-related(An Automated Genome and EST Assembler, 作者:Bastien Chevreux 最新版本:V2.9.46, 22/06 2009)

MIRA是一个多通道(multipass)组装软件,主要用于基因组或EST(表达序列标签)数据. 它能够处理各种混合型数据,如Solexa, 454, 3730等,并且它能用于检测重复序列和SNPs;

但MIRA需要比较长的时间运行,尤其你在处理mate pair(双末端测序序列)信息时;

MIRA采取逐步递进式精化组装的contigs。 在每条consensus序列中的每个碱基,它都会产生一个repeat ration的柱状图,并给予打分和标记;

MIRA采用了一个跟DNPTrapper处理SNP的相似策略;

MIRA会变得混乱,如果你的输入的DNA序列是来自不同个体的 - 他们会有着很大不同的SNPs位点。此时你则必须要告诉MIRA你的DNA序列是来自不同个体的基因组序列。

 

一些意见:

AMOS检验

预先筛选基因组的n-mer重复序列. 这将给你的组装带来期望的结果。

预先筛选您的序列为你预料到目前和屏蔽这些重复序列。

用Blast比对你的contigs - 看结果是否有意义的?

用Blast比对你的singlets - 看看是不是你所期望看到的(e.g. sequences that translate into missing elements of pathways?)

 

Assembly and analysis组装软件

gsAssembler - aka Newbler, from Roche. Version 2.3 was just released before the workshop. GenePool users can download the software and manuals.

Velvet

Curtain

MIRA

Abyss

The R libraries ShortRead and Rolexa

Minimus2

Phrap

Celera assembler and CABOG

Bambus - plus a document on how to use Bambus with MIRA

 

Visualising assemblies组装可视化软件

gsAssembler - aka Newbler, has a graphical interface.

Hawkeye

Tablet

Consed

Gap5

CLCBio

EagleView

 

Other software其他相关应用软件

simhtsd - Given a reference sequence, simhtsd creates a large set of short nucleotide reads, simulating the output from high throughput DNA sequencers such as the Illumina Genome Analyzer II.

MAQ simulate - another way to simulate short nucleotide reads. Part of the MAQ suite.

Make - a utility program, usually used for specifying the building of executable programs. Also suggested as a useful

scripting tool for running pipelines such as assembly pipelines.

Maker -a genome annotation pipeline for smaller eukaryotic and prokaryotic genome projects to annotate their

genomes and to create genome databases.

RAST annotation server

 

我自己也写了一个模拟序列程序,可以模拟DNA序列,并同时考虑有variation(SNPS, Indels, SVs),甲基化(Methylation),测序错误,各种NGS形式的序列,单末端或双末端,BAC序列,加adpater,primier,vector等--simulateSeq (v0.1.9,但尚未发布)

 

后续工作:

Outcomes

Script everything. This helps cut down unnecessary work, and aids in reproducibility and accountability. One tool mentioned as particularly useful in this regard is Make, as it understands dependencies, and can be used to avoid re-running things that have already been done, as well as keeping a record of  steps taken.

Share scripts. A discussion was held about how and where a useful script repository could be set up. Tools such as trac and subversion were mentioned. This is probably a topic to be followed up on. In the meantime, the NERC Environmental Bioinformatics Centre has a webpage where utility scripts for handling new sequencing data types are being recorded. People are invited to submit links to their scripts, or the scripts themselves if they do not have capacity to host them locally, to be added to this webpage, until an alternative script repository is established.

Use metrics. Mark Blaxter prepared an overview of the types of metrics discussed and trialled during the workshop. These are a place to start when assembling sequence data:

 

Base level metrics

high length of contigs (and full length relative to expected genome size)

high N50 few contigs in N50

many contigs over 1 kb over 10kb

longest contig

All of these need to be balanced with quality of the assembly of course.

Comparative assembly metrics

eveness of coverage of contigs

same assembly achieved with several parameter sets

same assembly achieved with different programs/algorithms

same assembly achieved with different data sets.

Read distribution metrics

low propoation of reads rejected as singletons (genome only)

even coverages of reads over assembly (congurent with expected fold coverage)

correct spacing of paired reads in assembly

 

Biological affirmation

synteny with related genome

breaks/contig ends map to likely repetitive elements (Tn)

congruent with transcriptome data (exon mapping)

congruent with restriction map (optical) map or genetic map

 

源自:

Summary report on the Next Generation Sequcen Assembly workshop and NextGenBug meeting held from

Nov 30 - Dec 2, 2009 in Edinburgh.

Next Generation Sequencing Assembly Workshop

eScience Institute, Edinburgh, November 30 - December 2, 2009

Full program and list of presenters. All of the presentations will be available on the eScience Institute website.

This workshop was funded by the Scottish Bioinformatics Forum, the National eScience Institute and the GenePool

  评论这张
 
阅读(2675)| 评论(0)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018