AI produced more false positives and less positive predictive values than human radiologists in its assessment of chest X-rays.

Study finds chest X-ray AI far from ready to make real world diagnoses on its own

September 28, 2023

by John R. Fischer, Senior Reporter

Despite the FDA approving AI-based chest X-ray solutions for assisting radiologists, a new study indicates the need for more research and refinements before they can be trusted to make accurate diagnoses alone in real-world clinical settings.

Researchers in Denmark found that human radiologists were significantly better at identifying the presence and absence of three common lung diseases, with AI programs prone to produce more false positives and become less accurate in cases with multiple findings or smaller targets.

“Too many false-positive diagnoses would result in unnecessary imaging, radiation exposure, and increased costs,” said lead researcher Dr. Louis Plesner, resident radiologist and Ph.D. fellow at Herlev and Gentofte Hospital in Copenhagen, in a statement.

Plesner and his team compared assessments of four commercially available AI systems to those of 72 radiologists for identifying airspace disease (a chest X-ray pattern caused by pneumonia or lung edema), pneumothorax (collapsed lung), and pleural effusion (a buildup of water around the lungs) in 2,040 consecutive adult chest X-rays from four Danish hospitals. Of the X-rays, 669 (32.8%) had at least one target finding.

AI sensitivity ranged from 72% to 91% for airspace disease; 63% to 90% for pneumothorax, and 62% to 95% for pleural effusion. Despite showing moderate to high sensitivity compared to the radiologists, AI produced more false positives, along with significant differences in positive predictive values provided by AI versus humans. For example, for pneumothorax, positive predictive values ranged from 56% to 86%, compared to 96% by radiologists. These assessments were worse for airspace disease, ranging between 40% and 50%.

“In this difficult and elderly patient sample, the AI predicted airspace disease where none was present five to six out of 10 times. You cannot have an AI system working on its own at that rate,” said Plesner.

The main issue, he says, is that these solutions are rarely tested in real-world clinical settings where cases often present multiple diseases. Researchers are also focused more on the ability of this technology to identify the presence or absence of diseases, rather than the accuracy. Additionally, in most prior studies where AI surpassed radiologists in accuracy, clinicians reviewed images without access to patient clinical history and previous imaging studies. In the real world, these three sources make up a synthesis of a radiologist’s diagnosis.

“We speculate that the next generation of AI tools could become significantly more powerful if capable of this synthesis as well, but no such systems exist yet,” he said.

The findings were published in Radiology, a journal of the Radiological Society of North America.