BioXP-0.5B: Explainable Medical-AI via RL-GRPO

Concepedia

Publication | Open Access

DOI Full Paper Access

Citations

References

2024

Year

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song

arXiv (Cornell University)

Abstract

BioXP-0.5B is a 🤗 Medical-AI model trained using our two-stage fine-tuning approach: Supervised Fine-Tuning (SFT): The model was initially fine-tuned on labeled data(MedMCQA) to achieve strong baseline accuracy on multiple-choice medical QA tasks. Group Relative Policy Optimization (GRPO): In the second stage, GRPO was applied to further align the model with human-like reasoning patterns. This reinforcement learning technique enhances the model’s ability to generate coherent, high-quality explanations and improve answer reliability.