Publication | Closed Access
Constructing biological knowledge bases by extracting information from text sources.
597
Citations
14
References
1999
Year
EngineeringMachine LearningKnowledge ExtractionDatabasesSemantic WebText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsText SourcesRelational Learning MethodBiomedical Text MiningNamed-entity RecognitionKnowledge RepresentationKnowledge DiscoveryMachine-learning MethodsInformation ExtractionMedline RecordsKnowledge BaseRelationship ExtractionSystems Biology
Efforts are underway to make molecular biology databases more accessible and interoperable, yet text sources such as MEDLINE remain underutilized. We aim to automatically map information from text sources into structured knowledge bases. We employ machine‑learning techniques, including statistical text classification and relational learning, to induce fact‑extraction routines, and we reduce learning cost by leveraging weakly labeled data. Initial experiments show that the statistical and relational learning methods can successfully induce extraction routines, and using weakly labeled data lowers the cost of learning.
Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task--a statistical text classification method, and a relational learning method--and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from "weakly" labeled training data.
| Year | Citations | |
|---|---|---|
Page 1
Page 1