分子钟的发现对于进化研究具有十分重要的意义。它不仅能用于粗略
根据蛋白质的序列或结构差异关系可构建分子进化树(evolut
构建进化树的方法包括两种:一类是序列类似性比较,主要是基于氨
序列进化树
构建序列进化树的主要步骤是比对,建立取代模型,建立进化树以及
建立一个比对模型的基本步骤包括:选择合适的比对程序;然后从比
取代模型既影响比对,也影响建树;因此需要采用递归方法。对于核
三种主要的建树方法分别是距离、最大节约(maximum parsimony, MP)和最大似然(maximum likelihood,ML)。最大似然方法考察数据组中序列
距离方阵方法简单的计算两个序列的差异数量。这个数量被看作进化
单一的进化树的数量会随着分类群数量的增长而呈指数增长,从而变
绝大多数分析方法都使用“启发式”的搜索。启发式现搜索出相近的
除上述当前应用最广的方法外,还有大量的建立和搜索进化树的其它
上述的建树方法所产生的都是无根树(进化树没有进化的极性)。为
现在已经有一些程序可以用来评估数据中的系统发育信号和进化树的
结构进化树
随着X-ray、NMR等实验技术的的进步,蛋白质结构数据的数
目前有关蛋白质结构比较的研究方法很多,主要有刚体结构叠合比较
刚体叠合所构建的进化树适用于同源蛋白质结构预测的骨架结构的选
当已知2个以上同源蛋白质的晶体结构时,可将每两套结构的原子坐
刚体结构叠合方法需要蛋白质的晶体结构数据的质量要高。事实上,
多特征结构比较以及构建“类结构”进化树的原理与基于残基匹配记
相关软件 Phylip
PHYLIP是一个包含了大约30个程序的软件包,这些程序基本
开发PAUP的目的是为系统发育分析提供一个简单的,带有菜单界
除了PAUP和PHYLIP以外,还有其它一些系统发育程序,这
http:/
PHYLOGENY PROGRAMS
http:/
PHYLOGENETIC ANALYSIS COMPUTER PROGRAMS
http:/
BIOCATALOG MOLECULAR EVOLUTION
http:/
PHYLIPhttp:/
想学建树的都要看看Nei那本绿皮书的相关章节。结合我个人的经
先说方法的选择。有三种nj(距离法的代表), parsimony(最大简约), ml(极大似然)。一般来讲,如果模型合适,ml的效果最好。
nj和ml是需要选择模型的。先说nj。nj的模型是用来算距离
ml用的都是极大似然模型。tree-puzzle的文档对各种
至于软件,ml树推荐用phyml,速度最快,或用paml,名
tree-puzzle是个好软件,用的是所谓的四级ml近似,
另外,很近缘序列一般用nucleotide,有时蛋白根本没区
PS:实用中,只要方法、模型合理,建出的树都有意义,可以随便
高手们可以来补充,我也学习一下。先谢谢了。
mediocrebeing 于dxy 2005-09-08 16:25
有关进化树分析的一些方法学问题
http:/
进化树也称种系树,英文名叫“Phyligenetic tree”。对于一个完整的进化树分析需要以下几个步骤:
⑴ 要对所分析的多序列目标进行排列(To align sequences)。做ALIGNMENT的软件很多,最经
⑵ 要构建一个进化树(To reconstrut phyligenetic tree)。构建进化树的算法主要分为两类:独立元素法(di
⑶ 对进化树进行评估。主要采用Bootstraping法。进化
一般来说,最大简约性法适用于符合以下条件的多序列:i 所要比较的序列的碱基差别小,ii 对于序列上的每一个碱基有近似相等的变异率,iii 没有过多的颠换/
另外,需要特别指出的是对于一些特定多序列对象来说可能没有任何
标 题: [范文]进化树构建中常用的MP与ML,及相关的软件介绍
标
发信站: 水木社区 (Thu Sep
als using molecular da
nships are maximum likelihood (ML) and maximum parsimony (MP).
ood evaluates a hypothesis about evolutionary history in terms of the probabilit
y that the proposed model and the hypothesized history would give rise to the ob
served da
) is chosen.
tal number of evolutionary steps required to explain a given set of da
Maximum Likelihood
e lower variance than other methods (least affected by sampling error), tend to
be robust to violations of the assumptions in the evolutionary model, are statis
tically well founded, can statistically evaluate different tree topologies and u
se all of the sequence information.
mputationally intensive (slow) and the result depends on the model of evolution
(Opperdoes, 1997a).
Program Da
Tree-Puzzle DNA or Protein sequence
PAML (CODEML) DNA or Protein sequence
DNAML (PHYLIP) DNA sequence
fastDNAml DNA sequence
Tree-Puzzle
Tree-Puzzle is a program for maximum likelihood analysis of DNA or protein seque
nce da
ng, that allows analysis of large da
s of branch support to each internal branch.
likelihood distances as well as branch lengths for user specified trees.
Input: Sequence input is requested as an alignment file in PHYLIP interleaved fo
rmat.
Options: The user may choose the model of substitution to be applied, HKY (Haseg
awa et al 1985) is the default for DNA and Dayhoff (Dayhoff et al. 1978) is the
default for protein sequence.
atio and nucleotide frequencies, however if these are left blank the program wil
l estimate them from the da
erogeneity, the default is uniform rate.
pecified tree and the output options.
uence to be designated as the outgroup, this should be the number of the individ
ual in the alignment file (for example, the first sequence would be 1, the fourt
h sequence would be 4).
Output: Tree-Puzzle, when used with the default options, gives a summary of the
sequence da
any other trees that occurred more than 5% of the time in the 1000 (default) puz
zling steps.
MAXIMUM LIKELIHOOD BRANCH LENGTHS ON QUARTET PUZZLING TREE (NO CLOCK)
Branch lengths are computed using the selected model of
substitution and rate heterogeneity.
:-----7
:
:
:---2 AF157941
:
:--1 AF157928
AF157928
AF157941
AF157877
AF157953
GVO389531
Quartet puzzling tree with maximum likelihood branch lengths
(in CLUSTAL W notation):
(AF157928:0.01919,((AF157877:0
GVO389531:0.23022)100:0.03991,
PAML
In the PAML package on iNquiry is the program codeml, which does maximum likelih
ood for DNA or protein sequence.
re combined to create codeml.
Input: DNA or protein sequence may be directly pasted in or a file may be specif
ied.
racters, followed by the sequence name, then the sequence (see example input for
ProtPars).
Options: There are options for the general run of the program and on
for DNA and protein.
e of sequence, the tree, and other parameters for estimating trees.
imp
he pull-down list, the default is 0, or user-specified tree, if not supplying a
tree).
on frequency, genetic co
ecify the genetic co
k for mammalian mitochondrial DNA sequence).
he model, alpha and the matrix.
lldown menu the user must specify a matrix file.
Output: There are three output files from paml: rst gives codon sites with posit
ion differences and star trees, mlc gives site patterns, sequence differences, c
odon usage in sequences, a distance matrix and the best tree.
best tree: (((1, 2), 4), 3, 5); lnL: -2853.476553
DNAml
DNAml is part of the PHYLIP package, fastDNAml performs the same functions using
less memory.
fastDNAml
FastDNAml performs unrooted maximum likelihood on aligned DNA sequence.
faster than DNAml and has the ability to save progress toward finding a tree (ca
n be restarted from a checkpoint).
Input: Aligned DNA sequence.
Options: The user may specify the base frequencies or check the box for the prog
ram to derive them from the sequence da
y the order of the sequences as in Tree-Puzzle) and the transition/
ratio.
uence from FASTA format to PHYLIP interleaved format.
otstrapping the tree(s) found by the program.
isplay of the output and the rearrangements of trees.
for user-specified weights and trees.
Output: The first output file is a tree and the second is a summary of the resul
ts.
ed distinct da
length values and approximate confidence limits.
Maximum Parsimony
timal (or minimal) tree.
hared and derived characters, therefore a cladistic method, it tries to provide
information on the ancestral sequences and evaluates different trees.
tages are:
does not correct for multiple mutations (no model of evolution), does not provi
de information on branch lengths and it is sensitive to codon bias (Opperdoes, 1
997b).
wo maximum parsimony programs for sequence da
rom the PHYLIP package.
Program Da
PROTPARS Protein sequence
DNAPARS DNA sequence
PROTPARS
This program applies a novel method for inferring unrooted phylogeny from protei
n sequences.
ssumptions of the method.
Input: Aligned protein sequence, where the first line contains the number of spe
cies and the number of amino acid positions, then the species da
nce starts on a new line, has a ten-character species name, immediately followed
by the species da
Options: There is an option for using threshold parsimony and specifying the thr
eshold value as well as specifying the genetic co
options for randomizing and bootstrapping as well as input for a user-specified
tree.
gnating the sequence by the order (the first sequence is 1, etc.).
Output: The program gives the most parsimonious tree (or trees).
On
remember: (although rooted by outgroup) this is an unrooted tree!
requires a total of
DNAPARS
This program searches bifurcating and multifurcating trees for the most parsimon
ious trees and saves a number of trees tied for best and rearranges all of the s
aved trees.
Input: Aligned protein sequence, where the first line contains the number of spe
cies and the number of amino acid positions, then the species da
nce starts on a new line, has a ten-character species name, immediately followed
by the species da
Options: There is an option for using threshold parsimony and specifying the thr
eshold value.
as input for a user-specified tree.
options and specify an outgroup, by designating the sequence by the order (the
first sequence is 1, etc.).
Output: The program gives the most parsimonious tree (or trees) and distances.
References
All information contained in this document was obtained from the respective fine
manual of the program or Nei, M. and Kumar, S. 2000. Molecular Evolution and Ph
ylogenetics. Oxford University Press, Inc., New York, unless cited otherwise.
Felsenstein, J. 2004. Statistical properties of parsimony, pp. 97-122 and A digr
ession on history and philosophy, pp. 123-146, in Inferring Phylogenies. Sinaur
Associates, Inc., Sunderland, Massachusetts.
Opperdoes, F. 1997. Maximum Likelihood. Retreived 20 April 2004.http:/
ucl.ac.be/
Opperdoes, F. 1997. Maximum Parsimony Analysis. Retreived 20 April 2004. http:/
www.icp.ucl.ac.be/
多序列比对与进化树间的关系(zz)
发信人: chevalier (burn myself to warm her), 信区: Board_Apply
标
发信站: 水木社区 (Thu Sep
简单回答一下polyhedron的疑问 :)
首先,多序列比对的结果是不确定的,没有一个最终正确的解,而是
Clustal的算法,简单来说,是这样子的把所有n个sequ
2. 根据Neighbor-Joining的原则以及上述计算结果
接下来的步骤是源于一个技巧,就是:
每个对比后的sequence pair (alignment)都可以跟第三条序列或者另一个新的al
比对,这是算法上可以实现的。
所以,第三步是这样的:假定NJ Tree 是这样的: (A,B),(C,D)
3. 首先,比对closest pair: AB;然后,比对second closest pair: CD
最后,比对AB 和 CD, 这样得出最终的alignment结果
如果 NJ Tree 是这样的: ((A,B),C),D
那么,首先比对A,B, 然后比对 AB,C; 最后,比对ABC,D
显然,第二步得到的NJ Tree起到指导的作用,按照距离远近,用来决定下一个参与比
4. 根据最终的alignment结果,便可以做ML(Maxim
评论