Application of ChatGPT as a content generation tool in continuing medical education: acne as a test topic
Accepted: 5 November 2024
SUPPLEMENTARY: 5
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
The large language model (LLM) ChatGPT can answer open-ended and complex questions, but its accuracy in providing reliable medical information requires a careful assessment. As part of the AICHECK (Artificial Intelligence for CME Health E-learning Contents and Knowledge) Study, aimed at evaluating the potential of ChatGPT in continuous medical education (CME), we compared ChatGPT-generated educational contents to the recommendations of the National Institute for Health and Care Excellence (NICE) guidelines on acne vulgaris. ChatGPT version 4 was exposed to a 23-item questionnaire developed by an experienced dermatologist. A panel of five dermatologists rated the answers positively in terms of “quality” (87.8%), “readability” (94.8%), “accuracy” (75.7%), “thoroughness” (85.2%), and “consistency” with guidelines (76.8%). The references provided by ChatGPT obtained positive ratings for “pertinence” (94.6%), “relevance” (91.2%), and “update” (62.3%). The internal reproducibility was adequate both for answers (93.5%) and references (67.4%). Answers related to issues of uncertainty and/or controversy in the scientific community scored the lowest. This study underscores the need to develop rigorous evaluation criteria for AI-generated medical content and for expert oversight to ensure accuracy and guideline adherence.
OpenAI. ChatGPT. https://chat.openai.com/chat
Noy S, Zhang W. Experimental evidence on the productivity effects of generative artificial intelligence. Science 2023;381:187-92. DOI: https://doi.org/10.1126/science.adh2586
Meskó B, Topol EJ. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit Med 2023;6:120. DOI: https://doi.org/10.1038/s41746-023-00873-0
Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front Artif Intell 2023;6:1169595. DOI: https://doi.org/10.3389/frai.2023.1169595
Safranek CW, Sidamon-Eristoff AE, Gilson A, Chartash D. The role of large language models in medical education: applications and implications. JMIR Med Educ 2023;9:e50945. DOI: https://doi.org/10.2196/50945
Eysenbach G. The role of ChatGPT, Generative Language Models, and Artificial Intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ 2023;9:e46885. DOI: https://doi.org/10.2196/46885
Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel) 2023;11:887. DOI: https://doi.org/10.3390/healthcare11060887
Shah NH, Entwistle D, Pfeffer MA. Creation and adoption of Large Language Models in medicine. JAMA 2023;330:866-9. DOI: https://doi.org/10.1001/jama.2023.14217
Gordon ER, Trager MH, Kontos D, et al. Ethical considerations for artificial intelligence in dermatology: a scoping review. Br J Dermatol 2024;190:789-97. DOI: https://doi.org/10.1093/bjd/ljae040
Chen S, Kann BH, Foote MB, et al. Use of Artificial Intelligence chatbots for cancer treatment information. JAMA Oncol 2023;9:1459-62. DOI: https://doi.org/10.1001/jamaoncol.2023.2954
Ferreira AL, Chu B, Grant-Kels JM, et al. Evaluation of ChatGPT dermatology responses to common patient queries. JMIR Dermatol 2023;6:e49280. DOI: https://doi.org/10.2196/49280
Goodman RS, Patrinely JR, Stone CA Jr, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open 2023;6:e2336483. DOI: https://doi.org/10.1001/jamanetworkopen.2023.36483
Lam Hoai XL, Simonart T. Comparing meta-analyses with ChatGPT in the evaluation of the effectiveness and tolerance of systemic therapies in moderate-to-severe plaque psoriasis. J Clin Med 2023;12:5410. DOI: https://doi.org/10.3390/jcm12165410
Rossettini G, Cook C, Palese A, et al. Pros and cons of using Artificial Intelligence chatbots for musculoskeletal rehabilitation management. J Orthop Sports Phys Ther 2023;53:1-7. DOI: https://doi.org/10.2519/jospt.2023.12000
Temsah O, Khan SA, Chaiah Y, et al. Overview of early ChatGPT's presence in medical literature: insights from a hybrid literature review by ChatGPT and human experts. Cureus 2023;15:e37281. DOI: https://doi.org/10.7759/cureus.37281
Bettoli V, Naldi L, Santoro E, et al. ChatGPT and acne: Accuracy and reliability of the information provided-The AI-check study. J Eur Acad Dermatol Venereol 2024. DOI: https://doi.org/10.1111/jdv.20324
National Institute for Health and Care Excellence (NICE) Acne vulgaris: management. NICE guideline 198. Available from: https://www.nice.org.uk/guidance/ng198/resources/acnevulgaris-management-pdf-66142088866501.
Nast A, Dréno B, Bettoli V, et al. European evidence-based (S3) guideline for the treatment of acne - update 2016 - short version. J Eur Acad Dermatol Venereol 2016;30:1261-8. DOI: https://doi.org/10.1111/jdv.13776
Reynolds RV, Yeung H, Cheng CE, et al. Guidelines of care for the management of acne vulgaris. J Am Acad Dermatol 2024;90:1006.e1-1006.e30. DOI: https://doi.org/10.1016/j.jaad.2023.12.017
Zaenglein AL, Pathy AL, Schlosser BJ, et al. Guidelines of care for the management of acne vulgaris. J Am Acad Dermatol 2016;74:945-73.e33. DOI: https://doi.org/10.1016/j.jaad.2015.12.037
Lakdawala N, Channa L, Gronbeck C, et al. Assessing the accuracy and comprehensiveness of ChatGPT in offering clinical guidance for atopic dermatitis and acne vulgaris. JMIR Dermatol 2023;6:e50409. DOI: https://doi.org/10.2196/50409
Cirone K, Akrout M, Abid L, Oakley A. Assessing the utility of multimodal Large Language Models (GPT-4 Vision and Large Language and Vision Assistant) in identifying melanoma across different skin tones. JMIR Dermatol 2024;7:e55508. DOI: https://doi.org/10.2196/55508
Reynolds K, Tejasvi T. Potential use of ChatGPT in responding to patient questions and creating patient resources. JMIR Dermatol 2024;7:e48451. DOI: https://doi.org/10.2196/48451
O'Hagan R, Poplausky D, Young JN, et al. The accuracy and appropriateness of ChatGPT responses on nonmelanoma skin cancer information using zero-shot chain of thought prompting. JMIR Dermatol 2023;6:e49889. DOI: https://doi.org/10.2196/49889
Charvet-Berard AI, Chopard P, Perneger TV. Measuring quality of patient information documents with an expanded EQIP scale. Patient Educ Couns 2008;70:407-11. DOI: https://doi.org/10.1016/j.pec.2007.11.018
Copyright (c) 2024 the Author(s)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
PAGEPress has chosen to apply the Creative Commons Attribution NonCommercial 4.0 International License (CC BY-NC 4.0) to all manuscripts to be published.