||

Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis

The development of diagnostic tools for skin cancer based on artificial intelligence (AI) is increasing rapidly and will likely soon be widely implemented in clinical use. Even though the performance of these algorithms is promising in theory, there is limited evidence on the impact of AI assistance on human diagnostic decisions. Therefore, the aim of this systematic review and meta-analysis was to study the effect of AI assistance on the accuracy of skin cancer diagnosis. We searched PubMed, Embase, IEE Xplore, Scopus and conference proceedings for articles from 1/1/2017 to 11/8/2022. We included studies comparing the performance of clinicians diagnosing at least one skin cancer with and without deep learning-based AI assistance. Summary estimates of sensitivity and specificity of diagnostic accuracy with versus without AI assistance were computed using a bivariate random effects model. We identified 2983 studies, of which ten were eligible for meta-analysis. For clinicians without AI assistance, pooled sensitivity was 74.8% (95% CI 68.6–80.1) and specificity was 81.5% (95% CI 73.9–87.3). For AI-assisted clinicians, the overall sensitivity was 81.1% (95% CI 74.4–86.5) and specificity was 86.1% (95% CI 79.2–90.9). AI benefitted medical professionals of all experience levels in subgroup analyses, with the largest improvement among non-dermatologists. No publication bias was detected, and sensitivity analysis revealed that the findings were robust. AI in the hands of clinicians has the potential to improve diagnostic accuracy in skin cancer diagnosis. Given that most studies were conducted in experimental settings, we encourage future studies to further investigate these potential benefits in real-life settings.

Original picture published at https://pixabay.com/photos/skin-vein-brown-skin-cancer-human-49325/

Introduction

As a result of increasing data availability and computational power, artificial intelligence (AI) algorithms—have reached a level of sophistication that enables them to take on complex tasks previously only conducted by human beings1. Several AI algorithms are now approved by the United States Food and Drug Administration (FDA) for medical use2,3,4. Though there are currently no image-based dermatology AI applications that have FDA approval, several are in development2.

Skin cancer diagnosis relies heavily on the interpretation of visual patterns, making it a complex task that requires extensive training in dermatology and dermatoscopy5,6. However, AI algorithms have been shown to accurately diagnose skin cancers, even outperforming experienced dermatologists in image classification tasks in constrained settings7,8,9. However, these algorithms can be sensitive to data distribution shifts. Therefore, AI-human partnerships could provide performance improvements that surmount the limitations of both human clinicians or AI alone. Notably, Tschandl et al. demonstrated in their 2020 paper that the accuracy of clinicians supported by AI algorithms surpassed that of either clinicians or AI algorithms working separately10. This approach of an AI-clinician partnership is considered the most likely clinical use of AI in dermatology, given the ethical and legal concerns of automated diagnosis alone. Therefore, there is an urgent need to better understand how the use of AI by clinicians affects decision making11. The goal of this study was to evaluate the diagnostic accuracy of clinicians with vs. without AI assistance using a systematic review and meta-analysis of the available literature.

Discussion

This systematic review and meta-analysis included 12 studies and 67,700 diagnostic evaluations of potential skin cancer by clinicians with and without AI assistance. Our findings highlight the potential of AI-assisted decision-making in skin cancer diagnosis. All clinicians, regardless of their training level, showed improved diagnostic performance when assisted by AI algorithms. The degree of improvement, however, varied across specialties, with dermatologists exhibiting the smallest increase in diagnostic accuracy and non-dermatologists, including primary care providers, demonstrating the largest improvement. These results suggest that AI assistance may be especially beneficial for clinicians without extensive training in dermatology. Given that many dermatological AI devices have recently obtained regulatory approval in Europe, including some CE marked algorithms utilized in the analyzed studies24,25, AI assistance may soon be a standard part of a dermatologist’s toolbox. It is therefore important to better understand the interaction between human and AI in clinical decision-making.

While several studies have been conducted to evaluate the dermatologic use of new AI tools, our review of published studies found that most have only compared human clinician performance with that of AI tools, without considering how clinicians interact with these tools. Two of the studies in this systematic review and meta-analysis reported that clinicians perform worse when the AI tool provides incorrect recommendations10,19. This finding underscores the importance of accurate and reliable algorithms in ensuring that AI implementation enhances clinical outcomes, and highlights the need for further research to validate AI-assisted decision-making in medical practice. Notably, in a recent study by Barata et al.26, the authors demonstrated that a reinforcement learning model that incorporated human preferences outperformed a supervised learning model. Furthermore, it improved the performance of participating dermatologists in terms of both diagnostic accuracy and optimal management decisions of potential skin cancer when compared to either a supervised learning model or no AI assistance at all. Hence, the development of algorithms in collaboration with clinicians appears to be important for optimizing clinical outcomes.

Only two studies explored the impact of one explainability technique (CBIR) on physician’s diagnostic accuracy or perceived usefulness. The real clinical utility of explainability methods needs to be further examined, and current methods should be viewed as tools to interrogate and troubleshoot AI models27. Additionally, prior research has shown that human behavioral traits can affect trust and reliance on AI assistance in general28,29. For example, a clinician’s perception and confidence in the AI’s performance on a given task may influence whether they decide to incorporate AI advice in their decision30. Moreover, research has also shown that the human’s confidence in their decision, the AI’s confidence level, and whether the human and AI agree all influence if the human incorporates the AI’s advice30. To ensure that AI assistance supports and improves diagnostic accuracy, future research should investigate how factors such as personality traits29, cognitive style28 and cognitive biases31 affect diagnostic performance in real clinical situations. Such research would help inform the integration of AI into clinical practice.

Our findings suggest that AI assistance may be particularly beneficial for less experienced clinicians, consistent with prior studies of human-AI interaction in radiology32. This highlights the potential of AI assistance as an educational tool for non-dermatologists and for improving diagnostic performance in settings such as primary care or for dermatologists in training. In a subgroup analysis, we observed no significant difference between AI-assisted other medical professionals vs. unassisted dermatologists (data not shown). However, this area warrants further research.

Some limitations need to be considered when interpreting the findings. First, among the ten studies that provided sufficient data to conduct meta-analysis, there were differences in design, number and experience level of participants, target condition definition, classification task, and algorithm output and training. Taken together, this heterogeneity implies that direct comparisons should be interpreted carefully. Furthermore, caution is warranted for the interpretation of the subgroup analyses due to the small sample size of the subgroups (up to seven) and the data structure (i.e., repeated measures) since the same participants examined the clinical images both without and with AI assistance in most studies. Given the low number of studies, we refrained from performing further subgroup analyses, such as, comparing specific cancer diagnoses in the subset of articles where these are available. Despite these limitations, our results from this meta-analysis support the notion that AI assistance can yield a positive effect on clinician diagnostic performance. We were able to adjust for potential sources of heterogeneity, including diagnostic task and clinician experience level when comparing the diagnostic accuracy of clinicians with vs. without AI assistance. Moreover, no signs of publication bias and low likelihood of threshold effects were observed. Lastly, the findings were robust such that the pooled sensitivity and specificity nearly stayed the same after excluding outliers or low-quality studies.

Of note, few studies provided participating clinicians with both clinical data and dermoscopic images, which would be available in a real-life clinical situation. Previous research has shown that the use of dermoscopy enables a relative improvement of diagnostic accuracy of melanoma by almost 50% compared to the naked eye5. In one of such study, participants were explicitly not allowed to use dermoscopy during the patient examination19. Overall, only four studies were conducted in a prospective clinical setting, and three of these could be included for meta-analysis. Thus, most diagnostic ratings in this meta-analysis were made in experimental settings and do not necessarily reflect the decisions made in a clinical real-world situation.

One of the main concerns regarding the accuracy of AI tools rely on the quality of the data it has been trained on33. As only three studies used publicly available datasets, evaluation of the data quality is difficult. Furthermore, darker skin tones were underrepresented in the datasets of the included studies, which is a known problem in the field, as most papers do not report skin tone outputs34. However, datasets with diverse skin tones have been developed and made publicly available as an effort to reduce disparity in AI performance in dermatology35,36. Moreover, few studies provided detailed information about the origin and number of images that had been used for training, validation, and testing of the AI tool and different definitions of these terms were used across studies. There is a need for better transparency guidelines for AI tool reporting to enable users and readers to understand the limits and capabilities of these diagnostic tools. Efforts are being made to develop guidelines that are adapted for this purpose, including the STARD-AI37, TRIPOD-AI and, PROBAST-AI38 guidelines, as well as the dermatology-specific CLEAR Derm guidelines39. In addition, PRISMA-AI40 guidelines for systematic reviews and meta-analyses are being developed. These are promising initiatives that will hopefully make both the reporting and evaluation of AI diagnostic tool research more transparent.

Conclusion

The results of this systematic review and meta-analysis indicate that clinicians benefit from AI assistance in skin cancer diagnosis regardless of their experience level. Clinicians with the least experience in dermatology may benefit the most from AI assistance. Our findings are timely as AI is expected to be widely implemented in clinical work globally in the near future. Notably, only four of the identified studies were conducted in clinical settings, three of which could be included in the meta-analysis. Therefore, there is an urgent need for more prospective clinical studies conducted in real-life settings where AI is intended to be used, in order to better understand and anticipate the effect of AI on clinical decision making.

Oryginal article published at https://www.nature.com/articles/s41746-024-01031-w by Isabelle KrakowskiJiyeong KimZhuo Ran CaiRoxana DaneshjouJan LapinsHanna ErikssonAnastasia Lykou & Eleni Linos

Podobne wpisy

Dodaj komentarz

Twój adres e-mail nie zostanie opublikowany. Wymagane pola są oznaczone *