Fast methods for kernel-based text analysis

TLDR

Kernel‑based learning such as SVMs has achieved high accuracy in NLP by implicitly expanding feature combinations without extra computational cost, yet these methods remain too slow for large‑scale text analysis. The authors aim to convert a kernel‑based classifier into a simple, fast linear classifier by extending the Basket Mining algorithm. They extend the Basket Mining algorithm to transform the kernel‑based classifier into a lightweight linear model that preserves performance while reducing computational overhead. Experiments on English BaseNP Chunking, Japanese Word Segmentation, and Japanese Dependency Parsing demonstrate that the new classifiers are 30 to 300 times faster than standard kernel‑based classifiers.

Abstract

Kernel-based learning (e.g., Support Vector Machines) has been successfully applied to many hard problems in Natural Language Processing (NLP). In NLP, although feature combinations are crucial to improving performance, they are heuristically selected. Kernel methods change this situation. The merit of the kernel methods is that effective feature combination is implicitly expanded without loss of generality and increasing the computational costs. Kernel-based text analysis shows an excellent performance in terms in accuracy; however, these methods are usually too slow to apply to large-scale text analysis. In this paper, we extend a Basket Mining algorithm to convert a kernel-based classifier into a simple and fast linear classifier. Experimental results on English BaseNP Chunking, Japanese Word Segmentation and Japanese Dependency Parsing show that our new classifiers are about 30 to 300 times faster than the standard kernel-based classifiers.

References

Page 1

	Year	Citations

Page 1