A group of researchers are calling attention to the lack of transparency in information needed to validate the efficacy of AI solutions

Researchers critique Google Health mammo AI findings for lack of transparency

October 20, 2020
by John R. Fischer, Senior Reporter
A group of researchers are critiquing a recent publication that touts a Google Health AI system, asserting that it demonstrates how concerns over data privacy lead to less transparency of information needed to validate the efficiency of such solutions.

The international group claims that the study, which was published in Nature, exemplifies the concerns they have about the lack of transparency when publishing work on artificial intelligence algorithms for health applications. They note the restrictive data access procedures, lack of published computer codes, and unreported model parameters make it difficult for other researchers to validate or build upon the work demonstrated by the system, and say that sharing such information in an appropriate manner will still maintain patient privacy.

“For many kinds of data there is well-established precedent for what has been safely shared,” Levi Waldron, associate professor, department of epidemiology and biostatistics, CUNY Graduate School of Public Health and Health Policy, told HCB News. “Data that can't safely be made fully public is best shared through government databases that will independently handle access requests by researchers with a valid research use, such as the Database of Genotypes and Phenotypes (dbGaP) and the Sequence Read Archive (SRA).”

The AI model was released by Google Health back in January and was trained on more than 90,000 mammogram X-rays, according to VentureBeat. A team of U.S. and British researchers evaluated the solution in a test that included 28,000 mammogram results — 25,000 from the U.K. and 3,000 from the U.S. They found that AI was not only as accurate as the human radiologists, but that it cut false positives 5.7 percent in U.S. results and 1.2 percent in those read by British physicians.

More than 19 co-authors, including Waldron, affiliated with McGill University, the City University of New York (CUNY), Harvard University, and Stanford University assert this claim is lacking in scientific value due to the publication sharing little about the detailed methods and code in Google’s research. This includes a lack of a description of model development, data processing and training pipelines used, and the definition of several hyperparameters for the model’s architecture (the variables used by the model to make diagnostic predictions). It also did not include which variables were used to augment the data set on which the model was trained, which can “significantly” affect performance, according to the coauthors.

They reason that researchers are more focused on publishing their findings than spending time and resources to ensure they can be replicated. They also say that the researchers who develop AI solutions dictate the terms of sharing and write up the informed consent of patients, raising questions about how well informed patients are about the sharing of data.

“The concern stated about privacy attacks against the learned parameters of a deep learning model could not reveal more than what went into the model, which is a mammogram, and whether radiologists identified that image as containing cancer cells,” said Waldron. “It takes a lot of twisting of the imagination to come up with any scenario where that could affect any patient volunteer.”

He and his colleagues add that third-party validation is essential to ensuring AI solutions are assessed in an unbiased manner. This can be done through containerization (bundling an application together with all of its related configuration files); platforms specifically for sharing AI systems; and cloud platforms with authentication, they say.

The critique was published as an opinion piece, also in Nature.