Large Language Models Are Poor Medical Coders — Benchmarking of Medical Code Querying

Concepedia

Publication | Open Access

DOI Full Paper Access

105

Citations

References

2024

Year

Ali Soroush, Benjamin S. Glicksberg, Eyal Zimlichman, Yiftach Barash, Robert Freeman, Alexander W. Charney, Girish N. Nadkarni, Eyal Klang

NEJM AI

Abstract

BACKGROUND Large language models (LLMs) have attracted significant interest for automated clinical coding. However, early data show that LLMs are highly error-prone when mapping medical codes. We sought to quantify and benchmark LLM medical code querying errors across several available LLMs.

References

Page 1

	Year	Citations

Page 1