Large Scale Diagnostic Code Classification for Medical Patient Records

TLDR

Accurately labeling patient records with ICD9 and CPT codes—an essential but understudied medical coding problem—is limited by regulatory constraints and the fact that records often contain multiple correlated disease codes. The study frames medical coding as a multi‑label classification task and compares two efficient algorithms for diagnosis coding on a large patient dataset. We treat coding as a multi‑label classification problem and evaluate two efficient algorithms on a large patient dataset, contrasting them with the labor‑intensive manual labeling common in current practice.

Abstract

A critical, yet not very well studied problem in medical applications is the issue of accurately labeling patient records according to diagnoses and procedures that patients have undergone. This labeling problem, known as coding, consists of assigning standard medical codes (ICD9 and CPT) to patient records. Each patient record can have several corresponding labels/codes, many of which are correlated to specific diseases. The current, most frequent coding approach involves manual labeling, which requires considerable human effort and is cumbersome for large patient databases. In this paper we view medical coding as a multi-label classification problem, where we treat each code as a label for patient records. Due to government regulations concerning patient medical data, previous studies in automatic coding have been quite limited. In this paper, we compare two efficient algorithms for diagnosis coding on a large patient dataset.

References

Page 1

	Year	Citations

Page 1