Concepedia

Publication | Closed Access

Applying Conditional Random Fields to Japanese Morphological Analysis

723

Citations

15

References

2004

Year

TLDR

Japanese morphological analysis with CRFs is challenged by ambiguous word boundaries, unlike prior CRF work that assumed fixed boundaries. The paper aims to show how CRFs can be applied to Japanese morphological analysis despite word boundary ambiguity. The authors trained and evaluated CRFs on a standard Japanese morphological analysis corpus, comparing results to HMMs and MEMMs. CRFs resolve long‑standing issues, allow flexible hierarchical tagset features, reduce label and length bias, and outperform HMMs and MEMMs.

Abstract

This paper presents Japanese morphological analysis based on conditional random fields (CRFs). Previous work in CRFs assumed that observation sequence (word) boundaries were fixed. However, word boundaries are not clear in Japanese, and hence a straightforward application of CRFs is not possible. We show how CRFs can be applied to situations where word boundary ambiguity exists. CRFs offer a solution to the long-standing problems in corpus-based or statistical Japanese morphological analysis. First, flexible feature designs for hierarchical tagsets become possible. Second, influences of label and length bias are minimized. We experiment CRFs on the standard testbed corpus used for Japanese morphological analysis, and evaluate our results using the same experimental dataset as the HMMs and MEMMs previously reported in this task. Our results confirm that CRFs not only solve the long-standing problems but also improve the performance over HMMs and MEMMs.

References

YearCitations

Page 1