Text Mining-Supported Information Extraction: An Extended Methodology for Developing Information Extraction Systems

Abstract

Information extraction (IE) and knowledge discovery in databases (KDD) are both useful approaches for discovering information in textual corpora, but they have some deficiencies. Information extraction can identify relevant sub-sequences of text, but is usually unaware of emerging, previously unknown knowledge and regularities in a text and thus cannot form new facts or new hypotheses. Complementary to information extraction, emerging data mining methods and techniques promise to overcome the deficiencies of information extraction. This research work combines the benefits of both approaches by integrating data mining and information extraction methods. The aim is to provide a new high-quality information extraction methodology and, at the same time, to improve the performance of the underlying extraction system. Consequently, the new methodology should shorten the life cycle of information extraction engineering because information predicted in early extraction phases can be used in further extraction steps, and the extraction rules developed require fewer arduous test-and-debug iterations. Effectiveness and applicability are validated by processing online documents from the areas of eHealth and eRecruitment.

References

Page 1

	Year	Citations
Pattern Recognition and Machine Learning Journal of Electronic Imaging Artificial IntelligenceEngineeringMachine LearningMicroscopyAdvanced Imaging	2007	22K
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty, Andrew McCallum, Fernando C. N. Pereira ScholarlyCommons (University of Pennsylvania)	2001	13K
10.1162/153244303322753616 Applied Physics Letters	2000	3.6K
Pattern Recognition and Machine Learning John H. Maindonald Journal of Statistical Software	2007	1.4K
A Brief Survey of Text Mining Andreas Hotho, Andreas Nürnberger, Gerhard Paaß LDV-Forum/Journal for language technology and computational linguistics	2005	883
A Survey of Text Mining Techniques and Applications Vishal Gupta, Gurpreet Singh Lehal Journal of Emerging Technologies in Web Intelligence EngineeringBusiness IntelligencePattern MiningCorpus LinguisticsText Mining	2009	689
Mining knowledge from text using information extraction Raymond J. Mooney, Răzvan Bunescu ACM SIGKDD Explorations Newsletter EngineeringKnowledge ExtractionSemantic WebCorpus LinguisticsText Mining	2005	295
Introduction to information extraction Douglas E. Appelt AI Communications EngineeringKnowledge ExtractionSemantic WebOndline SourcesCorpus Linguistics	1999	167
A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models Andrew McCallum, David Jensen ScholarWorks@UMassAmherst (University of Massachusetts Amherst)	2003	56
An unsupervised method for joint information extraction and feature mining across different Web sites Tak-Lam Wong, Wai Lam Data & Knowledge Engineering Natural Language ProcessingWeb MiningInformation ExtractionJoint Information ExtractionInformation Retrieval	2008	32

Page 1