Publication | Open Access
Normalizing Early English Letters to Present-day English Spelling
16
Citations
7
References
2018
Year
This paper presents multiple methods for normalizing the most deviant and infrequent historical spellings in a corpus consisting of personal correspondence from the 15th to the 19th century.The methods include machine translation (neural and statistical), edit distance and rule-based FST.Different normalization methods are compared and evaluated.All of the methods have their own strengths in word normalization.This calls for finding ways of combining the results from these methods to leverage their individual strengths. Related WorkIn the past, normalization of old texts has received some attention as an NLP task.There are ready-made tools available for normalization such as VARD2 (Baron and Rayson, 2008) and MorphAdorner (Burns, 2013).These tools, however, are not sufficient to solve the problem automatically for our corpus.Using VARD2 requires manual work and MorphAdorner does not provide enough coverage for our data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1