Concepedia

Publication | Open Access

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

1K

Citations

22

References

2006

Year

TLDR

Microbiologists surveying bacterial and archaeal diversity often need comparative alignments of thousands of 16S rRNA genes, but computational resources and bioinformatics expertise limit high‑throughput analysis. The authors hypothesized that an online tool could efficiently align thousands of 16S rRNA genes using the NAST algorithm to create multiple sequence alignments. They implemented a web interface that compares user sequences to Greengenes’ Core Set of ~10,000 aligned non‑chimeric sequences, aligns them by orientation and gap insertion, removes non‑16S data, and produces distance matrices and taxonomic classifications. The tool enables researchers of varying experience to align and categorize large sequence sets, facilitating bacterial and archaeal community profiling.

Abstract

Microbiologists conducting surveys of bacterial and archaeal diversity often require comparative alignments of thousands of 16S rRNA genes collected from a sample. The computational resources and bioinformatics expertise required to construct such an alignment has inhibited high-throughput analysis. It was hypothesized that an online tool could be developed to efficiently align thousands of 16S rRNA genes via the NAST (Nearest Alignment Space Termination) algorithm for creating multiple sequence alignments (MSA). The tool was implemented with a web-interface at http://greengenes.lbl.gov/NAST . Each user-submitted sequence is compared with Greengenes' ‘Core Set’, comprising ∼10 000 aligned non-chimeric sequences representative of the currently recognized diversity among bacteria and archaea. User sequences are oriented and paired with their closest match in the Core Set to serve as a template for inserting gap characters. Non-16S data (sequence from vector or surrounding genomic regions) are conveniently removed in the returned alignment. From the resulting MSA, distance matrices can be calculated for diversity estimates and organisms can be classified by taxonomy. The ability to align and categorize large sequence sets using a simple interface has enabled researchers with various experience levels to obtain bacterial and archaeal community profiles.

References

YearCitations

Page 1