Concepedia

Publication | Closed Access

Prague Czech-English Dependency Treebank: Any Hopes for a Common Annotation Scheme?

13

Citations

9

References

2004

Year

Abstract

This paper introduces the Prague Czech-English Dependency Treebank (PCEDT), a new Czech-English parallel resource suitable for experiments in structural machine translation. We describe the process of building the core parts of the resources – a bilingual syntactically annotated corpus and translation dictionaries. A part of the Penn Treebank has been translated into Czech, the dependency annotation of the Czech translation has been done automatically from plain text. The annotation of Penn Treebank has been tranformed into dependency annotation scheme. A subset of corresponding Czech and English sentences has been annotated by humans. First experiments in Czech-English machine translation using these data have already been carried out. The resources being created at Charles University in Prague are scheduled for release as Linguistic Data Consortium data collection in 2004. 1.

References

YearCitations

Page 1