Publication | Closed Access
Automatic keyphrase extraction from scientific documents using N-gram filtration technique
70
Citations
11
References
2008
Year
Unknown Venue
EngineeringCorpus LinguisticsText MiningNatural Language ProcessingLanguage DocumentationInformation RetrievalData ScienceComputational LinguisticsScientific DomainLanguage StudiesMachine TranslationEnglish DocumentsComputational LexicologyKnowledge DiscoveryTerminology ExtractionKeyword SearchAutomatic Keyphrase ExtractionInformation ExtractionKeyword ExtractionData ExtractionLinguistics
In this paper we present an automatic Keyphrase extraction technique for English documents of scientific domain. The devised algorithm uses n-gram filtration technique, which filters sophisticated n-grams {1dnd4} along with their weight from the words of input document. To develop n-gram filtration technique, we have used (1) LZ78 data compression based technique, (2) a simple refinement step, (3) A simple Pattern Filtration algorithm and, (4) a term weighting scheme. In term weighting scheme, we have introduced the importance of position of sentence (where given phrase occurs first) in document and position of phrase in sentence for documents of scientific domain (which is literally more organized than other domains). The entire system is based upon statistical observations, simple grammatical facts, heuristics, and lexical information of English language. We remark that the devised system does not require a learning phase. Our experimental results with publically available text dataset, shows that the devised system is comparable with other known algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1