Publication | Closed Access
Clustering of highly homologous sequences to reduce the size of large protein databases
1.1K
Citations
1
References
2001
Year
Cluster ComputingLarge Protein DatabasesEngineeringGeneticsMolecular BiologyHomologous SequencesOutput DatabaseGenomicsSequence AlignmentBioinformatics DatabaseData ScienceData MiningProteomicsBiological DatabaseSequence AnalysisKnowledge DiscoveryOmicsFunctional GenomicsBioinformaticsProtein BioinformaticsFlexible ProgramComputational BiologyLarge ProteinSystems BiologyMedicine
We present a fast and flexible program for clustering large protein databases at different sequence identity levels. It takes less than 2 h for the all-against-all sequence comparison and clustering of the non-redundant protein database of over 560,000 sequences on a high-end PC. The output database, including only the representative sequences, can be used for more efficient and sensitive database searches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1