First public benchmark for AI radiology report generation launched by Rajpurkar Lab and Gradient Health
Press releases may be edited for formatting or style | December 18, 2024
Business Affairs
December 18, 2024 BOSTON, MA — The Rajpurkar Lab at Harvard Medical School's Department of Biomedical Informatics today announced the launch of ReXrank (rexrank.ai), the first standardized public leaderboard for evaluating AI models that generate radiology reports from chest X-rays.
With almost 1,000 FDA-cleared AI-enabled medical devices already in clinical use in the USA, it can be challenging for health systems, physicians, and researchers to differentiate and evaluate their performance against each other. To address this, the Rajpurkar Lab set out to create a public leaderboard and challenge that used objective, state-of-the-art assessments of performance, known as ReXrank. Gradient Health, the healthcare data company, supported this initiative by providing the ReXGradient dataset, the largest test dataset of its kind with 10,000 chest X-ray studies. To feature on the ReXrank leaderboard, creators of AI chest X-ray tools must test their performance on a series of large, public datasets, before submitting it for testing on the private ReXGradient, with the results being made available publicly. Initial benchmarking results show the MedVersa system, developed by the Rajpurkar Lab, achieving the top position on the leaderboard, significantly outperforming other models including OpenAI's GPT-4V, which ranked 15th. MedVersa demonstrated superior performance across multiple metrics, including RadCliQ, RadGraph, and RaTEScore.
"Many commercial and research groups are making claims about AI-powered radiology report generation without benchmarking against each other. Rather than rely on anecdotes, having standardized benchmarks allows us to separate scientific progress from hype," said Pranav Rajpurkar, Assistant Professor of Biomedical Informatics at Harvard Medical School. "The significant performance gap between specialized medical AI models and general-purpose AI reveals both the progress we've made and the challenges that remain in medical AI development."
"ReXrank provides the medical AI community with a much-needed standardized way to evaluate and compare automated report generation systems," said Xiaoman Zhang, lead researcher at the Rajpurkar Lab. "By incorporating multiple datasets and eight different evaluation metrics, we can better understand how these AI models perform in real-world scenarios."
"The ability to benchmark AI models against real-world, diverse radiology datasets is crucial for advancing the field," added Ouwen Huang, co-founder of Gradient Health. "ReXrank’s large-scale test set provides the comprehensive evaluation framework needed to validate these technologies before clinical deployment and accelerate the development of reliable AI systems."
|
|
You Must Be Logged In To Post A Comment
|