MIRA-related(An Automated Genome and EST Assembler, 作者：Bastien Chevreux 最新版本：V2.9.46, 22/06 2009）
MIRA是一个多通道（multipass）组装软件，主要用于基因组或EST（表达序列标签）数据. 它能够处理各种混合型数据，如Solexa, 454, 3730等，并且它能用于检测重复序列和SNPs；
MIRA会变得混乱，如果你的输入的DNA序列是来自不同个体的 - 他们会有着很大不同的SNPs位点。此时你则必须要告诉MIRA你的DNA序列是来自不同个体的基因组序列。
用Blast比对你的contigs - 看结果是否有意义的?
用Blast比对你的singlets - 看看是不是你所期望看到的(e.g. sequences that translate into missing elements of pathways?)
Assembly and analysis组装软件
gsAssembler - aka Newbler, from Roche. Version 2.3 was just released before the workshop. GenePool users can download the software and manuals.
The R libraries ShortRead and Rolexa
Celera assembler and CABOG
Bambus - plus a document on how to use Bambus with MIRA
gsAssembler - aka Newbler, has a graphical interface.
simhtsd - Given a reference sequence, simhtsd creates a large set of short nucleotide reads, simulating the output from high throughput DNA sequencers such as the Illumina Genome Analyzer II.
MAQ simulate - another way to simulate short nucleotide reads. Part of the MAQ suite.
Make - a utility program, usually used for specifying the building of executable programs. Also suggested as a useful
scripting tool for running pipelines such as assembly pipelines.
Maker -a genome annotation pipeline for smaller eukaryotic and prokaryotic genome projects to annotate their
genomes and to create genome databases.
RAST annotation server
我自己也写了一个模拟序列程序，可以模拟DNA序列，并同时考虑有variation(SNPS, Indels, SVs)，甲基化(Methylation)，测序错误，各种NGS形式的序列，单末端或双末端，BAC序列，加adpater，primier，vector等--simulateSeq (v0.1.9，但尚未发布）
Script everything. This helps cut down unnecessary work, and aids in reproducibility and accountability. On
Share scripts. A discussion was held about how and where a useful script repository could be set up. Tools such as trac and subversion were mentioned. This is probably a topic to be followed up on. In the meantime, the NERC Environmental Bioinformatics Centre has a webpage where utility scripts for handling new sequencing da
Use metrics. Mark Blaxter prepared an overview of the types of metrics discussed and trialled during the workshop. These are a place to start when assembling sequence da
Base level metrics
high length of contigs (and full length relative to expected genome size)
high N50 few contigs in N50
many contigs over 1 kb over 10kb
All of these need to be balanced with quality of the assembly of course.
Comparative assembly metrics
eveness of coverage of contigs
same assembly achieved with several parameter sets
same assembly achieved with different programs/algorithms
same assembly achieved with different da
Read distribution metrics
low propoation of reads rejected as singletons (genome on
even coverages of reads over assembly (congurent with expected fold coverage)
correct spacing of paired reads in assembly
synteny with related genome
breaks/contig ends map to likely repetitive elements (Tn)
congruent with transcriptome da
congruent with restriction map (optical) map or genetic map
Summary report on the Next Generation Sequcen Assembly workshop and NextGenBug meeting held from
Nov 30 - Dec 2, 2009 in Edinburgh.
Next Generation Sequencing Assembly Workshop
eScience Institute, Edinburgh, November 30 - December 2, 2009
Full program and list of presenters. All of the presentations will be available on the eScience Institute website.
This workshop was funded by the Scottish Bioinformatics Forum, the National eScience Institute and the GenePool