Concepedia

Abstract

Presents an algorithm for extracting keywords representing the asserted main point in a document, without relying on external devices such as natural-language processing tools or a document corpus. Our algorithm, KeyGraph, is based on the segmentation of a graph, representing the co-occurrence between terms in a document, into clusters. Each cluster corresponds to a concept on which an author's idea is based, and the top-ranked terms are selected as keywords using a statistic based on each term's relationship to these clusters. This strategy comes from considering that a document is constructed like a building for expressing new ideas based on traditional concepts. The experimental results show that the thus-extracted terms match the author's main point quite accurately, even though KeyGraph does not use each term's average frequency in a corpus, i.e. KeyGraph is a content-sensitive, domain-independent indexing device.

References

YearCitations

Page 1