Publication | Open Access
Data-driven approaches for information structure identification
55
Citations
9
References
2005
Year
Unknown Venue
Syntactic ParsingEngineeringPart-of-speech TaggingStructured DataData-driven ApproachesCorpus LinguisticsText MiningApplied LinguisticsNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationData IntegrationGrammarLanguage StudiesMachine TranslationAnnotation GuidelinesKnowledge DiscoveryTopic Focus ArticulationComputer ScienceInformation ManagementInformation ExtractionTreebanksStructure DiscoveryStructure MiningDecision TreesLinguisticsData Modeling
This paper investigates automatic identification of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features inspired by the annotation guidelines. We present the performance of decision trees (C4.5), maximum entropy, and rule induction (RIPPER) classifiers on all tectogrammatical nodes. We compare the results against a baseline system that always assigns f(ocus) and against a rule-based system. The best system achieves an accuracy of 90.69%, which is a 44.73% improvement over the baseline (62.66%).
| Year | Citations | |
|---|---|---|
Page 1
Page 1