An algorithm for suffix stripping

TLDR

The paper was first published in 1980 and is republished as part of a 40th‑anniversary series of the journal Program. The study aims to automatically remove suffixes from English words to improve information retrieval. An efficient BCPL algorithm strips suffixes by decomposing complex suffixes into simple components and removing them in successive, stem‑dependent steps guided by syllable length. The algorithm, though simple, outperforms a more elaborate system and serves as a valuable historical reference for information retrieval.

Abstract

Purpose The automatic removal of suffixes from words in English is of particular interest in the field of information retrieval. This work was originally published in Program in 1980 and is republished as part of a series of articles commemorating the 40th anniversary of the journal. Design/methodology/approach An algorithm for suffix stripping is described, which has been implemented as a short, fast program in BCPL. Findings Although simple, it performs slightly better than a much more elaborate system with which it has been compared. It effectively works by treating complex suffixes as compounds made up of simple suffixes, and removing the simple suffixes in a number of steps. In each step the removal of the suffix is made to depend upon the form of the remaining stem, which usually involves a measure of its syllable length. Originality/value The piece provides a useful historical document on information retrieval.

References

Page 1

	Year	Citations

Page 1