Concepedia

Publication | Closed Access

New Methods in Automatic Extracting

1.5K

Citations

3

References

1969

Year

TLDR

Previous work focused on high‑frequency keywords, but this study also incorporates pragmatic cue words, title/heading words, and sentence‑location indicators. The paper proposes new automatic document‑screening methods that select sentences most likely to convey the document’s substance. The system uses a parameterized model that weights frequency, cue words, title/heading words, and sentence location, with dictionaries compiled and parameters tuned through comparative evaluation against manual extracts. The study produced an operating system and methodology, showing that the new components outperform the frequency component in generating superior extracts.

Abstract

This paper describes new methods of automatically extracting documents for screening purposes, i.e. the computer selection of sentences having the greatest potential for conveying to the reader the substance of the document. While previous work has focused on one component of sentence significance, namely, the presence of high-frequency content words (key words), the methods described here also treat three additional components: pragmatic words (cue words); title and heading words; and structural indicators (sentence location). The research has resulted in an operating system and a research methodology. The extracting system is parameterized to control and vary the influence of the above four components. The research methodology includes procedures for the compilation of the required dictionaries, the setting of the control parameters, and the comparative evaluation of the automatic extracts with manually produced extracts. The results indicate that the three newly proposed components dominate the frequency component in the production of better extracts.

References

YearCitations

Page 1