Application of ChatGPT as a content generation tool in continuing medical education: acne as a test topic

Luigi Naldi; Vincenzo Bettoli; Eugenio Santoro; Maria Rosa Valetto; Anna Bolzon; Fortunato Cassalia; Simone Cazzaniga; Sergio Cima; Andrea Danese; Silvia Emendi; Monica Ponzano; Nicoletta Scarpa; Pietro Dri

doi:10.4081/dr.2024.10138

The large language model (LLM) ChatGPT can answer open-ended and complex questions, but its accuracy in providing reliable medical information requires a careful assessment. As part of the AICHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) Study, aimed at evaluating the potential of ChatGPT in continuous medical education (CME), we compared ChatGPT-generated educational contents to the recommendations of the National Institute for Health and Care Excellence (NICE) guidelines on acne vulgaris. ChatGPT version 4 was exposed to a 23-item questionnaire developed by an experienced dermatologist. A panel of five dermatologists rated the answers positively in terms of “quality” (87.8%), “readability” (94.8%), “accuracy” (75.7%), “thoroughness” (85.2%), and “consistency” with guidelines (76.8%). The references provided by ChatGPT obtained positive ratings for “pertinence” (94.6%), “relevance” (91.2%), and “update” (62.3%). The internal reproducibility was adequate both for answers (93.5%) and references (67.4%). Answers related to issues of uncertainty and/or controversy in the scientific community scored the lowest. This study underscores the need to develop rigorous evaluation criteria for AI-generated medical content and for expert oversight to ensure accuracy and guideline adherence.

OpenAI. ChatGPT. https://chat.openai.com/chat

Noy S, Zhang W. Experimental evidence on the productivity effects of generative artificial intelligence. Science 2023;381:187-92. DOI: https://doi.org/10.1126/science.adh2586

Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med 2023;6:120. DOI: https://doi.org/10.1038/s41746-023-00873-0

Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023;6:1169595. DOI: https://doi.org/10.3389/frai.2023.1169595

Safranek CW, Sidamon-Eristoff AE, Gilson A, Chartash D. The role of large language models in medical education: applications and implications. JMIR Med Educ 2023;9:e50945. DOI: https://doi.org/10.2196/50945

Eysenbach G. The role of ChatGPT, Generative Language Models, and Artificial Intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 2023;9:e46885. DOI: https://doi.org/10.2196/46885

Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel) 2023;11:887. DOI: https://doi.org/10.3390/healthcare11060887

Shah NH, Entwistle D, Pfeffer MA. Creation and adoption of Large Language Models in medicine. JAMA 2023;330:866-9. DOI: https://doi.org/10.1001/jama.2023.14217

Gordon ER, Trager MH, Kontos D, et al. Ethical considerations for artificial intelligence in dermatology: a scoping review. Br J Dermatol 2024;190:789-97. DOI: https://doi.org/10.1093/bjd/ljae040

Chen S, Kann BH, Foote MB, et al. Use of Artificial Intelligence chatbots for cancer treatment information. JAMA Oncol 2023;9:1459-62. DOI: https://doi.org/10.1001/jamaoncol.2023.2954

Ferreira AL, Chu B, Grant-Kels JM, et al. Evaluation of ChatGPT dermatology responses to common patient queries. JMIR Dermatol 2023;6:e49280. DOI: https://doi.org/10.2196/49280

Goodman RS, Patrinely JR, Stone CA Jr, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open 2023;6:e2336483. DOI: https://doi.org/10.1001/jamanetworkopen.2023.36483

Lam Hoai XL, Simonart T. Comparing meta-analyses with ChatGPT in the evaluation of the effectiveness and tolerance of systemic therapies in moderate-to-severe plaque psoriasis. J Clin Med 2023;12:5410. DOI: https://doi.org/10.3390/jcm12165410

Rossettini G, Cook C, Palese A, et al. Pros and cons of using Artificial Intelligence chatbots for musculoskeletal rehabilitation management. J Orthop Sports Phys Ther 2023;53:1-7. DOI: https://doi.org/10.2519/jospt.2023.12000

Temsah O, Khan SA, Chaiah Y, et al. Overview of early ChatGPT's presence in medical literature: insights from a hybrid literature review by ChatGPT and human experts. Cureus 2023;15:e37281. DOI: https://doi.org/10.7759/cureus.37281

Bettoli V, Naldi L, Santoro E, et al. ChatGPT and acne: Accuracy and reliability of the information provided-The AI-check study. J Eur Acad Dermatol Venereol 2024. DOI: https://doi.org/10.1111/jdv.20324

National Institute for Health and Care Excellence (NICE) Acne vulgaris: management. NICE guideline 198. Available from: https://www.nice.org.uk/guidance/ng198/resources/acnevulgaris-management-pdf-66142088866501.

Nast A, Dréno B, Bettoli V, et al. European evidence-based (S3) guideline for the treatment of acne - update 2016 - short version. J Eur Acad Dermatol Venereol 2016;30:1261-8. DOI: https://doi.org/10.1111/jdv.13776

Reynolds RV, Yeung H, Cheng CE, et al. Guidelines of care for the management of acne vulgaris. J Am Acad Dermatol 2024;90:1006.e1-1006.e30. DOI: https://doi.org/10.1016/j.jaad.2023.12.017

Zaenglein AL, Pathy AL, Schlosser BJ, et al. Guidelines of care for the management of acne vulgaris. J Am Acad Dermatol 2016;74:945-73.e33. DOI: https://doi.org/10.1016/j.jaad.2015.12.037

Lakdawala N, Channa L, Gronbeck C, et al. Assessing the accuracy and comprehensiveness of ChatGPT in offering clinical guidance for atopic dermatitis and acne vulgaris. JMIR Dermatol 2023;6:e50409. DOI: https://doi.org/10.2196/50409

Cirone K, Akrout M, Abid L, Oakley A. Assessing the utility of multimodal Large Language Models (GPT-4 Vision and Large Language and Vision Assistant) in identifying melanoma across different skin tones. JMIR Dermatol 2024;7:e55508. DOI: https://doi.org/10.2196/55508

Reynolds K, Tejasvi T. Potential use of ChatGPT in responding to patient questions and creating patient resources. JMIR Dermatol 2024;7:e48451. DOI: https://doi.org/10.2196/48451

O'Hagan R, Poplausky D, Young JN, et al. The accuracy and appropriateness of ChatGPT responses on nonmelanoma skin cancer information using zero-shot chain of thought prompting. JMIR Dermatol 2023;6:e49889. DOI: https://doi.org/10.2196/49889

Charvet-Berard AI, Chopard P, Perneger TV. Measuring quality of patient information documents with an expanded EQIP scale. Patient Educ Couns 2008;70:407-11. DOI: https://doi.org/10.1016/j.pec.2007.11.018

Naldi, L., Bettoli, V., Santoro, E., Valetto, M. R., Bolzon, A., Cassalia, F., Cazzaniga, S., Cima, S., Danese, A., Emendi, S., Ponzano, M., Scarpa, N., & Dri, P. (2024). Application of ChatGPT as a content generation tool in continuing medical education: acne as a test topic. Dermatology Reports. https://doi.org/10.4081/dr.2024.10138