blastclust clusters a database of protein or nucleotide sequences. It outputs rows of sequence identifiers from the database with clustered sequences occurring on the same row and clusters sorted from largest to smallest. The program can generate a list of clusters for input into another program (e.g., an alignment program such as PHRAP); however, it should be used only on a relatively small number of sequences (10-1000) because it runs only on a single computer, and the RAM requirements quickly exceed most capacities.
Here are a few sample command lines:
blastclust -i my_nucdb -p F -o my_nucdb.clusters
blastclust -i my_pepdb -o my_pepdb.clusters -L 0.7 -S 90
The following reference describes parameters used with blastclust.
Specifies the number of CPUs to use on a multiprocessor machine.
Requires coverage on both sequences. If set to T, the program requires both sequences to pass the coverage criteria set with -L before they are called neighbors and clustered together.
Specifies a configuration file with advanced options. The configuration file is simply a list of the options that you commonly use.
The crash recovery option. Set it to complete unfinished clustering. Set to T if using the -r option with a file to restore the clustering. Use the same command line as the crashed run with the same -s, with only -C, T, and -r being added. This restarts the run using the hit list file specified by -r and then appending to it (as specified by -s).
The input file is a BLAST database, not a FASTA file.
Enables ID parsing in the database-formatted report.
Specifies the FASTA input file for clustering.
Restricts the reclustering to the IDs specified in [file]. It can be useful when you have a very large FASTA database and wish to cluster a subset of sequences.
Specifies the length of coverage threshold.
Input sequences are proteins. Set to F for nucleotides.
Specifies the file used to restore neighbors for reclustering. Set -C to T. This file is created by the -s command of a previous run. Use it if the program crashes during a run.
Specifies the file in which to save the hit list. This file can restore a crashed run and is the input file specified by -r.
Prints progress messages. Progress is reported to standard output if no file is specified.
|Default: Protein 3, Nucleotide 32|