注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

云之南

风声,雨声,读书声,声声入耳;家事,国事,天下事,事事关心

 
 
 

日志

 
 
关于我

专业背景:计算机科学 研究方向与兴趣: JavaEE-Web软件开发, 生物信息学, 数据挖掘与机器学习, 智能信息系统 目前工作: 基因组, 转录组, NGS高通量数据分析, 生物数据挖掘, 植物系统发育和比较进化基因组学

网易考拉推荐

Analyzing 454 Sequencing Data  

2011-05-03 15:32:01|  分类: 生物信息学 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

http://hi.baidu.com/cnelon1133/blog/item/e7d9f06324fec6c8e7113a99.html

The signal processing phase for 454 (Titanium) data is computationally intensive, and is most efficiently done using the UA ICE cluster.  The sequence assembly phase can also be done on ICE.  In many cases, the sequencing facility will run the signal processing step on your data, and contig assembly may also be included for you.  Annotation may be done via the RAST and/or NCBI PGAAP servers.


What is the ICE cluster and how do I use it?
See http://bcf.arl.arizona.edu/high-throughput-computing/ice-cluster-computing-faqs.html
How do I copy my 454 data to ICE?

First check that you have enough disk space on ICE to hold the data, or use an xdisk allocation if you need more space (xdisk usage is covered in the ICE cluster FAQ).  If you have .sff files, you can simply transfer them to ICE using the scp command with your UANetID password: scp   *.sff  login@hpc.arizona.edu:dest_dir.                                        If you need to do signal processing (see Question 4 below), you will need to use scp to copy the entire image processing directory, e.g. 'D_yyyy_mm_dd_hh_mm_ss_FLX05080355_imageProcessingOnly', as follows: 
    scp -r  D_yyyy_mm_dd_hh_mm_ss_FLX05080355_imageProcessingOnly  login@hpc.arizona.edu:dest_dir


Can I run 454 image processing on ICE?

No, image processing must be done on the 454 instrument computer.


How do I run 454 signal processing on ICE?

You probably don't need to, as this is done as part of the sequencing service!  But if you insist: Copy the script /genome/ICEbin/454sig.pbs or /genome/ICEbin/454sigPairedEnd.pbs to your directory and modify it to use your email, group, and data file locations.  The 454 software runs best with bash, so leave the first line as is.  Submit the job using one of the commands:
     qsub 454sig.pbs
     qsub 454sigPairedEnd.pbs
Using 8 processors, your signal processing run should take 10 hours or less on ICE.  You will receive an email message when your job begins, ends, or aborts.  For more information on PBS jobs, see the HPC FAQ .


How can I assemble or map my 454 sequences?

There are several choices for de novo assemblers: Roche newbler/gsAssembler, Celera, and MIRA have been tested and seem to work reasonably well with 454 data.  Running each is described below in a separate question/answer.  Each assembler is installed in a different location, necessitating the alteration of the PATH environment variable in the .cshrc file in the user's home directory.  For an example, see the file /genome/.cshrc.ex 
Remember that having fewer contigs does not necessarily imply a better assembly, as repeat regions may have been collapsed.  For mapping, run gsMapper (see next question).


How do I use Roche newbler/gsAssembler or gsMapper?

Roche's gsMapper and gsAssembler (aka newbler) can be run interactively from the ICE command line.  To use roche software, first run the command:  'module load roche454'.  If you wish to view the resulting assembly using consed (version 17 is required) be sure to select 'Complete consed folder' in the gsAssembler Parameters Tab before starting the assembly.  After creating a new project, select the Project Tab, select the 'GS reads' Tab and click the '+' button to add sff files to the project.  Set the desired parameters from the Parameters Tab and press the 'Start' button.  After the computation has finished you can use the 'Results files', 'Alignment results', and 'Flowgrams' Tabs.  Small genome assemblies typically take about 10 minutes to complete on ICE.  For larger projects, you may use  /genome/ICEbin/454asm.pbs as a template for a batch assembly.  For more complete information see the Roche documentation.  If you are using the command line assembly tool runAssembly and you are combining paired end reads with shotgun reads, you must specify the paired end reads first, e.g. runAssembly -o MyProject -p Mypairedend.sff Myshotgun.sff


How do I use the MIRA assembler?

Before running mira, you must extract fasta, quality, and xml clipping information from the454 sff files, using the sff_extract command (one long line).  For example: sff_extract  -s  proj_in.454.fasta  -q  proj_in.454.fasta.qual  -x proj_traceinfo_in.454.xml  FX3UAMY01.sff FX3UAMY02.sff   Be sure to name the files as illustrated.  The command to run the mira assembler must be entered on one long line as well.  Example:  mira -project=proj -job=denovo,genome,normal,454  >& proj.log 
The main log file will contain any error messages.  If the log file contains a message that says 'megahubs' were found, try re-running the mira command and include the options: -SK:mnr=yes and -SK:nrr=10   There is a sample PBS script /rsgrps1/nrsc/genome/ICEmirademo.csh that you may copy and modify.  For more hints on assembling difficult projects, see: http://chevreux.org/uploads/media/mira3_hard.html. The proj_d_info directory contains an assembly.txt file that summarizes the assembly results, and the proj_d_results directory contains the .ace, .fasta, and .fasta.qual files for the resulting assembly.  For more information on mira, see  http://mira-assembler.sourceforge.net/docs/mira.html


How do I use the Celera Assembler?

The Celera assembler is installed in /genome/celera/wgs-5.4/Linux-amd64/bin so you will need to add this to your PATH environment variable.  The celera assembly process also requires a non-standard Perl library, so you will also need to add the following line to the .cshrc file in your home directory:
     setenv PERLLIB  /genome/ICEbin/lib 
After adding this line, type the command:
     source  ~/.cshrc 
There are several steps involved in running the celera assembler. The first is to convert your sff file(s) to frg format by running the sffToCA command, e.g.:
sffToCA  -trim hard  -clear 454  -libraryname MyLib  -output FX3UAMY01.frg FX3UAMY01.sff
The -trim hard option is recommended for unpaired Titanium reads to tell the assembler to ignore low-quality bases that are beyond the trimming point established by the Roche software. The libraryname is used for removing fuplicate reads, and is stored in the resulting frg file. For paired-end Titanium reads, -trim chop is recommended to make sure the linker sequence is removed.
Then create a specifications file, containing lines such as:
     doOverlapTrimming=1
     overlapper=mer
     unitigger = bog 
     utgErrorRate=0.03
     createACE=1
     /scrN/MyDir/FX3UAMY01.frg
Your spec file tells the celera assembler to use the bog 'best overlap graph' method (recommended for 454 data), and the mer overlapper, which takes into account the homopolymer length uncertainty characteristic of 454 reads. However, sometimes the mer overlapper fails or runs for too long. In that case, you can use the default setting of overlapper=ovl. Setting utgErrorRate=0.03 is recommended for 454 Titanium reads (as opposed to the default error rate of 0.015, which is appropriate for Sanger reads). The createACE=1 option produces an ace-format file that can be opened in consed . This option stores the ace file in a compressed format (bz2) that will need to be unzipped before it can be viewed (see below). Lastly, the spec file should list the frg file(s) you wish to assemble.
The command to run the assembler is:   runCA-OBT.pl  -d  asmdir  -p project  -s specfile 
If you prefer not to use a spec file, you can put the options from that file directly on the command
line. The last argument should be your .frg file(s). For example:
runCA -d asmdir -p project doOverlapTrimming=1 overlapper=mer unitigger=bog utgErrorRate=0.03 createACE=1 FX3UAMY01.frg
The command should be one long line, even if it wraps around. Do not hit enter until after the frg file list. Also, note that some options are preceded by a minus sign (-d and -p), while others are not.
Output files, including fasta files of all the assembled contigs,  can be found in the 9-terminator directory.  If you used the createACE=1 option to output an ace file, you will need to run the following command to unzip the compressed file before opening it:
     bunzip2 project.ace.bz2
Once the ace file has been unzipped, run this command to open it in consed:
     consed -nophd -ace  project.ace
If you want to create an ace file for a completed project that doesn't already have one, run the following commands::
     gatekeeper  -dumpfrg  project.gkpStore  > project.frg 
     /genome/celera/bin/ca2ace.pl project.asm 
     bunzip2 project.ace.bz2
For more information, see the runCA file at http://wgs-assembler.sourceforge.net and http://www.cbcb.umd.edu/research/CeleraAssembler.shtml
For recommendations about usage of the Celera Assembler with 454 reads, see the following pages:
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=SFF_SOP
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Roche_454_Platforms


How do I use consed to edit my assembly?
Consed version 19 is installed in /genome/ICEbin/consed19 so you will need to add this to your PATH environment variable.  (See http://bcf.arl.arizona.edu/high-throughput-computing/ice-cluster-computing-faqs.html for instructions on setting your PATH.)   For gsAssembler you can run consed as usual (from the edit_dir, type the command: consed).  For celera and mira assemblies you must run: consed -nophd -ace project.ace
How can I annotate my project?
For annotation, submit your assembled contigs to the RAST server or to the NCBI PGAAP server.  RAST is very easy to use. You may wish to try different assemblers and submit the results from each to RAST.  Then you may be able to judge which assembly is best suited for submission to PGAAP.
  评论这张
 
阅读(1772)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016