Extracting and evaluating general world knowledge from the Brown corpus

Abstract

We have been developing techniques for extracting general world knowledge from miscellaneous texts by a process of approximate interpretation and abstraction, focusing initially on the Brown corpus. We apply interpretive rules to clausal patterns and patterns of modification, and concurrently abstract general "possibilistic" propositions from the resulting formulas. Two examples are "A person may believe a proposition", and "Children may live with relatives". Our methods currently yield over 117,000 such propositions (of variable quality) for the Brown corpus (more than 2 per sentence). We report here on our efforts to evaluate these results with a judging scheme aimed at determining how many of these propositions pass muster as "reasonable general claims" about the world in the opinion of human judges. We find that nearly 60% of the extracted propositions are favorably judged according to our scheme by any given judge. The percentage unanimously judged to be reasonable claims by multiple judges is lower, but still sufficiently high to suggest that our techniques may be of some use in tackling the long-standing "knowledge acquisition bottleneck" in AI.

References

Page 1

	Year	Citations
Assessing agreement on classification tasks: the kappa statistic Jean Carletta ArXiv.org EngineeringSubjective JudgmentsPsycholinguisticsCorpus LinguisticsText Mining	1996	2.1K
Learning dictionaries for information extraction by multi-level bootstrapping Ellen Riloff, Rosie Jones EngineeringSemantic LexiconSemantic WebCorpus LinguisticsJournalism	1999	683
Finding parts in very large corpora Matthew Berland, Eugene Charniak EngineeringKnowledge ExtractionSemanticsSemantic WebCorpus Linguistics	1999	464
A class-based approach to lexical discovery Philip Resnik Natural Language ProcessingEngineeringLexical ResourceLexical DiscoveryComputational Lexicology	1992	108
Can we derive general world knowledge from texts? Lenhart K. Schubert EngineeringKnowledge ExtractionTextual EntailmentSemanticsLanguage Learning	2002	100
Semantic classes and syntactic ambiguity Philip Resnik EngineeringConceptual ClassesLexical SemanticsSemanticsSemantic Web	1993	76
Learning class-to-class selectional preferences Eneko Agirre, David Martínez Artificial IntelligenceEngineeringMachine LearningSemanticsLanguage Learning	2001	76
Acquisition of selectional patterns Ralph Grishman, John Sterling EngineeringSelectional PatternsNatural SelectionCognitionBibliometrics	1992	60
Tagging for learning Uri Zernik, Paul S. Jacobs	1990	33
An Iterative Approach to Estimating Frequencies over a Semantic Hierarchy Stephen Clark, David Weir Sussex Research Online (University of Sussex)	1999	15

Page 1