As companies make big claims that their AI systems can detect cancer better than radiologists, researchers are warning that these algorithms might not be ready for prime time.
A review of 12 studies about algorithms used to detect breast cancer found that they were of “poor methodological quality” and none of them were applicable to clinical practice. It included a study touted by Google’s DeepMind last year.
“Current evidence on the use of AI systems in breast cancer screening is a long way from having the quality and quantity required for its implementation into clinical practice,” the study’s authors wrote in the paper, published in the BMJ this month.
They hoped their findings would encourage others to push for high-quality evidence when considering adding AI into their breast cancer screening programs.
According to the review, written by researchers at the University of Warwick’s Division of Health Sciences, the algorithms shared one common limitation: they were all retrospective. That makes it difficult to know how well they would work in clinical practice, either in supporting radiologists’ decisions, or in making triage decisions independently. If an algorithm flagged a cancer case that a radiologist had previously missed, it’s also difficult to know the true cancer status of that patient.
Five of the studies tested AI as a replacement for radiologists, four of them tested it as a screening tool, and three tested it as a reader aid. All of the algorithms used deep learning.
Another concern the researchers raised was that many of the studies used skewed datasets, which might not be reflective of breast cancer in the general population. Four studies enrolled women randomly, but the rest picked specific cases or used controls to add more patients with cancer to their datasets.
Finally, promising results in smaller studies were not replicated in larger studies. While five small studies claimed their AI tools performed better than radiologists, they were small, carried a high risk of bias and could not be generalized. Three larger studies that compared 36 AI tools to a single radiologist found that 94% of them were less accurate than the radiologist, and all of them were less accurate than two or more radiologists.
The researchers’ findings are supported by past papers, including a review of 23 studies published in 2019, that also focused on AI models for breast cancer detection. The authors of that paper also said that the studies were predominately small, retrospective, and based on highly selected image datasets with a high proportion of cancers.
A more recent review of FDA-cleared AI devices raised similar concerns. Although it wasn’t specific to breast cancer, its authors warned that almost all of the algorithms were based on retrospective data, and lacked basic information such as how many sites they had been evaluated in or performance across different patient demographics.
Photo credit: andresr, Getty Images