When 99% simply isn’t good enough

August 21, 2023

Artificial Intelligence Business Affairs

Narinder Singh

By Narinder Singh

The rapid evolution of artificial intelligence (AI) and its influence promises profound paradigm shifts across many sectors. Healthcare is no exception. With its capacity for machine learning, data analysis, and predictive modeling, AI promises to revolutionize patient care and healthcare administration. Intuitively, we know AI in this domain involves people’s lives, so it needs more stringent performance requirements. But how much is enough?

AI for administrative work
AI standards related to billing and process management are similar. AI that can clearly identify what drugs are covered by insurance or how bills can be processed more efficiently can be useful at reducing the overhead burden of medical administration.

AI for medical decisions
AI for medical decisions is more nuanced, however. While 99% sounds like great performance for a disease or treatment, context matters. If your doctor said they had 99% certainty that a drug would fix your severe migraine, you might jump at it - until they noted that the drug also presents a one percent chance of death. Even this kind of high stakes decision making is well understood within discussions regarding the potential of AI and can be evaluated in a straightforward manner.

For anyone who has spent time in a hospital, the sound of beeping alarms is at first jarring and then simply background noise. Alarms for some patients are going off every minute of the day - even though some studies have noted that more than 90% of alarms are not actionable or false. Yet these simple rules-based alarms are seldom adjusted down, because no one wants to miss even one case. Clinicians care about the sensitivity of detection (don't miss one) over the specificity (lots of false positives).

However, there is a substantive cost to these false alarms: people ignore them or react slowly to them. Some studies show that more than 60% of alarms take more than ten minutes to respond to. At first glance, it may seem that this reality lowers the bar for AI in hospitals, but it actually relates to why performance expectations are higher for AI systems.

Alarms are simply reporting a fact: HR is over X, MAP is over Y. They are not performing a clinical assessment. That’s left to the responder or the care team. The bar on making a decision using alarm data (or lab data or radiology data) is much, much higher, because so many false positives are entering the system.

For example, the expectations of an alarm that triggers on HR > X is easy to understand. Yet if an algorithm predicted whether the patient was having a cardiac event, it would be very different. The former is bringing data to an expert, and the latter is making an expert decision.



You Must Be Logged In To Post A Comment Sign In If you've already created an account, use your email address and password to sign in using the form below. Login Problems: Click here if you are having login issues. Email address: Password: Forgot your password? Login Problems? View our Legal Notice and Privacy Notice Register Registration is Free and Easy. Enjoy the benefits of The World's Leading New & Used Medical Equipment Marketplace. Register Now!