http://genome.ucsc.edu/FAQ/FAQblat.html
Running the Programs: The command line options of each of the programs is described below. Similar summaries of usage are printed when a command is run with no arguments. See the next section for info on installing webBlat.
blat
blat - Standalone BLAT sequence search command line tool
usage:
blat database query [-ooc=11.ooc] output.psl
where:
database is either a .fa file, a .nib file, or a list of .fa or .nib
files, query is similarly a .fa, .nib, or list of .fa or .nib files
-ooc=11.ooc tells the program to load over-occurring 11-mers from
and external file. This will increase the speed
by a factor of 40 in many cases, but is not required
output.psl is where to put the output.
options:
-t=type Database type. Type is one of:
dna - DNA sequence
prot - protein sequence
dnax - DNA sequence translated in six frames to protein
The default is dna
-q=type Query type. Type is one of:
dna - DNA sequence
rna - RNA sequence
prot - protein sequence
dnax - DNA sequence translated in six frames to protein
rnax - DNA sequence translated in three frames to protein
The default is dna
-prot Synonymous with -d=prot -q=prot
-ooc=N.ooc Use overused tile file N.ooc. N should correspond to
the tileSize
-tileSize=N sets the size of match that triggers an alignment.
Usually between 8 and 12
Default is 11 for DNA and 5 for protein.
-oneOff=N If set to 1 this allows one mismatch in tile and still
triggers an alignments. Default is 0.
-minMatch=N sets the number of tile matches. Usually set from 2 to 4
Default is 2 for nucleotide, 1 for protein.
-minScore=N sets minimum score. This is twice the matches minus the
mismatches minus some sort of gap penalty. Default is 30
-minIdentity=N Sets minimum sequence identity (in percent). Default is
90 for nucleotide searches, 25 for protein or translated
protein searches.
-maxGap=N sets the size of maximum gap between tiles in a clump. Usually
set from 0 to 3. Default is 2. Only relevent for minMatch > 1.
-noHead suppress .psl header (so it's just a tab-separated file)
-makeOoc=N.ooc Make overused tile file
-repMatch=N sets the number of repetitions of a tile allowed before
it is marked as overused. Typically this is 256 for tileSize
12, 1024 for tile size 11, 4096 for tile size 10.
Default is 1024. Typically only comes into play with makeOoc
-mask=type Mask out repeats. Alignments won't be started in masked region
but may extend through it in nucleotide searches. Masked areas
are ignored entirely in protein or translated searches. Types are
lower - mask out lower cased sequence
upper - mask out upper cased sequence
out - mask according to database.out RepeatMasker .out file
file.out - mask database according to RepeatMasker file.out
-qMask=type Mask out repeats in query sequence. Similar to -mask above but
for query rather than target sequence.
-minRepDivergence=NN - minimum percent divergence of repeats to allow
them to be unmasked. Default is 15. Only relevant for
masking using RepeatMasker .out files.
-dots=N Output dot every N sequences to show program's progress
-trimT Trim leading poly-T
-noTrimA Don't trim trailing poly-A
-trimHardA Remove poly-A tail from qSize as well as alignments in psl output
-out=type Controls output file format. Type is one of:
psl - Default. Tab separated format without actual sequence
pslx - Tab separated format with sequence
axt - blastz-associated axt format
maf - multiz-associated maf format
wublast - similar to wublast format
blast - similar to NCBI blast format
-fine For high quality mRNAs look harder for small initial and
terminal exons. Not recommended for ESTs
Here are some blat settings for common usage scenarios:
1) Mapping ESTs to the genome within the same species
-ooc=11.ooc
2) Mapping full length mRNAs to the genome in the same species
-ooc=11.ooc -fine -q=rna
3) Mapping ESTs to the genome across species
-q=dnax -t=dnax
4) Mapping mRNA to the genome across species
-q=rnax -t=dnax
5) Mapping proteins to the genome
-q=prot -t=dnax
6) Mapping DNA to DNA in the same species
-ooc=11.ooc -fastMap
7) Mapping DNA from one species to another species
-q=dnax -t=dnax
When mapping DNA from one species to another the
query side of the alignment should be cut up into chunks
of 25kb or less for best performance.
评论