Hugging Face has released a benchmark for testing generative artificial intelligence (AI) on health tasks. The benchmark called Open Medical-LLM is part of a larger effort to improve the performance and safety of large language models (LLMs) in various applications, including healthcare.
Open Medical-LLM is a collection of existing test sets — MedQA, PubMedQA, MedMCQA, etc. — aimed to evaluate models for general medical knowledge and health domains such as pharmacology or clinical practice. The federated benchmark platform includes multiple-choice and open-ended questions, as well as question banks from medical licensing examinations, to provide model evaluation and comparison.