Concepedia

TLDR

Text mining requires preprocessing steps such as case folding, tokenizing, filtering, and stemming, and in Indonesian documents stemming errors like over‑ and under‑stemming can degrade classification accuracy. This study aims to enhance preprocessing for Indonesian student complaint documents by applying the Sastrawi library to improve filtering and stemming. The authors implemented Sastrawi’s stemming and filtering modules, replacing the previously used Tala stemmer to address stemming inaccuracies. Results show that Sastrawi reduces over‑ and under‑stemming and achieves faster processing times compared to the Tala stemmer.

Abstract

Abstract In the text mining there are stages that must be passed namely the text preprocessing stage. Text preprocessing is the stage to do the data selection process in each document, including case folding, tokenizing, filtering, and stemming. The results of the preprocessing process can affect the accuracy of document classification. In documents Bahasa Indonesia, there are still often over-stemming and under-stemming, so improvements are needed in the stemming process. In this study, it is proposed to use sastrawi libraries to improve the results of previous studies that are still not optimal in the results of preprocessing, especially in the filtering and stemming process. From the results of the study, the sastrawi library is able to reduce over stemming and under stemming and a faster processing time compared to using a Tala stemmer.

References

YearCitations

Page 1