Publication | Open Access
Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison
1.7K
Citations
32
References
2010
Year
Wet-lab DdhDigital Dna-dna HybridizationGeneticsGenomicsSequence AlignmentHigh Throughput SequencingGenome-to-genome Sequence ComparisonPhylogeneticsMolecular EcologyEnvironmental MicrobiologyDna SequencingHybridizationSequence AnalysisDna ReplicationBioinformaticsBiologyMicrobial SystematicsNatural SciencesMicrobial Species DelineationComputational BiologyMicrobiologyPragmatic SpeciesMedicineGenome EditingSequence Assembly
DNA‑DNA hybridization underpins bacterial and archaeal species delineation but is laborious, error‑prone, and unsuitable for building comparative databases. The study aims to evaluate in‑silico genome‑to‑genome methods that can replicate DNA‑DNA hybridization for species delineation. They employ algorithms that identify high‑scoring segment pairs or maximally unique matches to compute whole‑genome distances. The evaluated distance functions outperform earlier methods in correlating with DDH, remain robust to incomplete genomes, correlate better with 16S rRNA distances, and are provided as a web service for species delineation.
The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.
| Year | Citations | |
|---|---|---|
Page 1
Page 1