Deep Learning for Extreme Multi-label Text Classification

TLDR

Extreme multi‑label text classification assigns documents to a subset of a vast label set, posing data sparsity and scalability challenges that recent tree‑based and embedding methods have begun to address, yet deep learning approaches remain largely unexplored. This study introduces the first deep‑learning framework for XMTC using CNN models tailored to multi‑label classification. The authors develop a family of CNN architectures that encode documents and predict label relevance, evaluated against seven state‑of‑the‑art baselines across six benchmark datasets. Across all datasets, the CNN models achieve the best or second‑best performance, notably surpassing the runner‑up by 11.7–15.3% in precision@k on the Wikipedia benchmark with 500,000 labels.

Abstract

Extreme multi-label text classification (XMTC) refers to the problem of assigning to each document its most relevant subset of class labels from an extremely large label collection, where the number of labels could reach hundreds of thousands or millions. The huge label space raises research challenges such as data sparsity and scalability. Significant progress has been made in recent years by the development of new machine learning methods, such as tree induction with large-margin partitions of the instance spaces and label-vector embedding in the target space. However, deep learning has not been explored for XMTC, despite its big successes in other related areas. This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network (CNN) models which are tailored for multi-label classification in particular. With a comparative evaluation of 7 state-of-the-art methods on 6 benchmark datasets where the number of labels is up to 670,000, we show that the proposed CNN approach successfully scaled to the largest datasets, and consistently produced the best or the second best results on all the datasets. On the Wikipedia dataset with over 2 million documents and 500,000 labels in particular, it outperformed the second best method by 11.7%~15.3% in [email protected] and by 11.5%~11.7% in [email protected] for K = 1,3,5.

References

Page 1

	Year	Citations

Page 1