Publication | Closed Access
Using feature generation and feature selection for accurate prediction of translation initiation sites.
64
Citations
7
References
2002
Year
Translation Initiation SiteEngineeringMultilingualismGeneticsFeature SelectionGenomicsSequence AlignmentGene RecognitionNatural Language ProcessingData ScienceComputational LinguisticsTranslation Initiation SitesMachine TranslationComputer-assisted TranslationSequence ModellingSequence AnalysisCorrect PredictionBioinformaticsFunctional GenomicsFeature GenerationNeural Machine TranslationBiologyComputational BiologySystems BiologyMedicineLinguistics
Correct prediction of the translation initiation site (TIS) is an important issue in genomic research. We show that feature generation together with correlation based feature selection can be used with a variety of machine learning algorithms to give highly accurate translation initiation site prediction. Only very few features are needed and the results achieve comparable accuracy to the best existing approaches. Our approach has the advantage that it does not require one to devise a special prediction method; rather standard machine learning classifiers are shown to give very good performance on the selected features. The raw and generated features which we have found to be important are the following: positions -3 and -1 in the sequence; upstream k-grams for k=3, 4, and 5; stop-codon frequency; downstream in-frame 3-gram; and the distance of ATG to the beginning of the sequence. The best result, with an overall accuracy of 90%, is obtained by selecting only seven features from this set. The same features retrained with the use of a scanning model achieves an overall accuracy of 94% on this dataset.
| Year | Citations | |
|---|---|---|
Page 1
Page 1