Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments

Concepedia

Publication | Open Access

DOI Full Paper Access

Citations

References

2023

Year

Brendin R. Beaulieu‐Jones, Sahaj Shah, Margaret T. Berrigan, Jayson S. Marwaha, Shuo-Lun Lai, Gabriel A. Brat

Unknown Venue

Abstract

Consistent with prior findings, we demonstrate robust near or above human-level performance of ChatGPT within the surgical domain. Unique to this study, we demonstrate a substantial inconsistency in ChatGPT responses with repeat query. This finding warrants future consideration and presents an opportunity to further train these models to provide safe and consistent responses. Without mental and/or conceptual models, it is unclear whether language models such as ChatGPT would be able to safely assist clinicians in providing care.

References

Page 1

	Year	Citations

Page 1