Publication | Open Access
Char_align
191
Citations
10
References
1993
Year
Unknown Venue
Natural Language ProcessingParallel TextsSyntaxEngineeringSentence LevelText NormalizationCorpus LinguisticsComputational LinguisticsCharacter LevelComputer ScienceGrammarLanguage StudiesText ProcessingLinguisticsMachine Translation
There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Rösenschein (to appear), Simard et al (1992), Warwick-Armstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al.
| Year | Citations | |
|---|---|---|
Page 1
Page 1