Evaluating accuracy and reproducibility of large language model performance on critical care assessments in pharmacy education

Concepedia

Publication | Open Access

DOI Full Paper Access

Citations

References

2025

Year

Huibo Yang, Mengxuan Hu, Amoreena Most, W. Anthony Hawkins, Brian Murray, Susan Smith, Sheng Li, Andrea Sikora

Frontiers in Artificial Intelligence

Abstract

ChatGPT-4 was the most accurate LLM on critical care pharmacy questions and few-shot CoT improved accuracy the most. Average student accuracy was similar to LLMs overall, and higher on knowledge application questions. These findings support the need for future assessment of customized training for the type of output needed. Reliance on LLMs is only supported with recall-based questions.

References

Page 1

	Year	Citations

Page 1