Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label\n Text Classification

Abstract

Extreme multi-label text classification (XMC) seeks to find relevant labels\nfrom an extreme large label collection for a given text input. Many real-world\napplications can be formulated as XMC problems, such as recommendation systems,\ndocument tagging and semantic search. Recently, transformer based XMC methods,\nsuch as X-Transformer and LightXML, have shown significant improvement over\nother XMC methods. Despite leveraging pre-trained transformer models for text\nrepresentation, the fine-tuning procedure of transformer models on large label\nspace still has lengthy computational time even with powerful GPUs. In this\npaper, we propose a novel recursive approach, XR-Transformer to accelerate the\nprocedure through recursively fine-tuning transformer models on a series of\nmulti-resolution objectives related to the original XMC objective function.\nEmpirical results show that XR-Transformer takes significantly less training\ntime compared to other transformer-based XMC models while yielding better\nstate-of-the-art results. In particular, on the public Amazon-3M dataset with 3\nmillion labels, XR-Transformer is not only 20x faster than X-Transformer but\nalso improves the Precision@1 from 51% to 54%.\n