Building a tree-bank of modern hebrew text

Abstract

This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored. An initial tree-bank that consists of 500 annotated sentences from a daily newspaper is de- scribed. The annotation scheme that underlies the analyses in the tree-bank integrates morphol- ogy and syntax. An existing morphological analyzer and a language-independent probabilistic parser were applied to this tree-bank. Based on the results of some experiments with these tools, a semi-automatic procedure for future enlargement of the tree-bank is outlined. This procedure starts out with the manual segmentation of words into morpheme sequences, then continues with automatic POS tagging and parsing of these morpheme sequences, followed by manual correction of the resulting parse-trees. The proposed procedure is expected to reduce the cost of the next annotation stages in this corpus.

References

Page 1

	Year	Citations

Page 1