Publication | Closed Access
Char align: A Program for Aligning Parallel Texts at the Character Level
24
Citations
0
References
1993
Year
Unknown Venue
There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Ro .. senschein (to appear), Simard et al (1992), WarwickArmstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al. 1. Introduction Parallel texts have recently received considerable attention in machine translation (e.g., Brown et al, 1990), bilingual lexicography (e.g., Klavans and Tzoukermann, 1990), and terminology resea...