Publication | Closed Access
U <sub>50</sub> : A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs
25
Citations
14
References
2017
Year
Advances in next-generation sequencing technologies enable routine genome sequencing, generating millions of short reads. A crucial step for full genome analysis is the de novo assembly, and currently, performance of different assembly methods is measured by a metric called N<sub>50</sub>. However, the N<sub>50</sub> value can produce skewed, inaccurate results when complex data are analyzed, especially for viral and microbial datasets. To provide a better assessment of assembly output, we developed a new metric called U<sub>50</sub>. The U<sub>50</sub> identifies unique, target-specific contigs by using a reference genome as baseline, aiming at circumventing some limitations that are inherent to the N<sub>50</sub> metric. Specifically, the U<sub>50</sub> program removes overlapping sequence of multiple contigs by utilizing a mask array, so the performance of the assembly is only measured by unique contigs. We compared simulated and real datasets by using U<sub>50</sub> and N<sub>50</sub>, and our results demonstrated that U<sub>50</sub> has the following advantages over N<sub>50</sub>: (1) reducing erroneously large N<sub>50</sub> values due to a poor assembly, (2) eliminating overinflated N<sub>50</sub> values caused by large measurements from overlapping contigs, (3) eliminating diminished N<sub>50</sub> values caused by an abundance of small contigs, and (4) allowing comparisons across different platforms or samples based on the new percentage-based metric UG<sub>50</sub>%. The use of the U<sub>50</sub> metric allows for a more accurate measure of assembly performance by analyzing only the unique, non-overlapping contigs. In addition, most viral and microbial sequencing have high background noise (i.e., host and other non-targets), which contributes to having a skewed, misrepresented N<sub>50</sub> value-this is corrected by U<sub>50</sub>. Also, the UG<sub>50</sub>% can be used to compare assembly results from different samples or studies, the cross-comparisons of which cannot be performed with N<sub>50</sub>.
| Year | Citations | |
|---|---|---|
Page 1
Page 1