Publication | Closed Access
Algorithms That Learn to Extract Information BBN: Description of the Sift System as Used for MUC-7
72
Citations
6
References
1998
Year
BBN introduced a fully trained system for named entity, term extraction, and relation extraction in MUC‑7, using statistical language models trained on annotated data that can be easily ported to new domains and learn complex interactions automatically, building on earlier fully statistical HMM‑based name‑finding results from MET‑1. For MUC‑7 TE and TR tasks, BBN developed SIFT, a single integrated trained model that replaces the handwritten‑pattern PLUM system. The evaluation shows that these trained systems perform roughly on par with expert‑hand‑tailored rule systems, demonstrating the viability of statistical approaches for information extraction.
Abstract : For MUC-7, BBN has for the first time fielded a fully-trained system for NE, TE, and TR; results are all the output of statistical language models trained on annotated data, rather than programs executing handwritten rules. Such trained systems have some significant advantages: 1. They can be easily ported to new domains by simply annotating data with semantic answers. 2. The complex interactions that make rule-based systems difficult to develop and maintain can here be learned automatically from the training data. We believe that the results in this evaluation are evidence that such trained systems, even at their current level of development, can perform roughly on a par with rules hand-tailored by experts. Since MUC-3, BBN has been steadily increasing the proportion of the information extraction process that is statistically trained. Already in MET-1, our name-finding results were the output of a fully statistical, HMM-based model, and that statistical Identifinder(trademark) model was also used for the NE task in MUC-7. For the MUC-7 TE and TR tasks, BBN developed SIFT, a new model that represents a significant further step along this path, replacing PLUM, a system requiring handwritten patterns, with SIFT, a single integrated trained model.
| Year | Citations | |
|---|---|---|
Page 1
Page 1