Objective: To evaluate the quality and readability of large language models (LLMs) when responding to Frequently Asked Questions (FAQs) about oral lichen planus (OLP). Methods: We evaluated the responses of three LLMs (ChatGPT-4o, Gemini 2.0 Flash Experimental, and Copilot) to 13 patient-centered FAQs about OLP. Questions were identified using query tools, and answers were assessed by 14 oral medicine experts using the Quality Assessment of Medical Artificial Intelligence (QAMAI) tool. Readability was analyzed with the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKG) tools. Results: All LLMs provided generally accurate and relevant responses, with median QAMAI scores indicating “good” to “very good” quality. ChatGPT achieved slightly higher completeness, particularly for questions on OLP definition and treatment. The reference provision was inconsistent across all chatbots. Readability analysis revealed that most responses required college-level literacy, with ChatGPT producing the most complex texts, Gemini occasionally achieving more accessible outputs, and Copilot situated in an intermediate position. Conclusions: LLMs may have potential as adjunctive tools for patient education in OLP, although they remain limited by incomplete information, inconsistent references, and suboptimal readability. Future research should incorporate longitudinal LLMs evaluations and training to develop models delivering accurate, accessible information, tailored to users' literacy levels.

Quality and Readability of Large Language Models' Responses to Oral Lichen Planus Patients' FAQs

Vito Carlo Alberto Caponio;
2026-01-01

Abstract

Objective: To evaluate the quality and readability of large language models (LLMs) when responding to Frequently Asked Questions (FAQs) about oral lichen planus (OLP). Methods: We evaluated the responses of three LLMs (ChatGPT-4o, Gemini 2.0 Flash Experimental, and Copilot) to 13 patient-centered FAQs about OLP. Questions were identified using query tools, and answers were assessed by 14 oral medicine experts using the Quality Assessment of Medical Artificial Intelligence (QAMAI) tool. Readability was analyzed with the Flesch Reading Ease (FRE) and Flesch–Kincaid Grade Level (FKG) tools. Results: All LLMs provided generally accurate and relevant responses, with median QAMAI scores indicating “good” to “very good” quality. ChatGPT achieved slightly higher completeness, particularly for questions on OLP definition and treatment. The reference provision was inconsistent across all chatbots. Readability analysis revealed that most responses required college-level literacy, with ChatGPT producing the most complex texts, Gemini occasionally achieving more accessible outputs, and Copilot situated in an intermediate position. Conclusions: LLMs may have potential as adjunctive tools for patient education in OLP, although they remain limited by incomplete information, inconsistent references, and suboptimal readability. Future research should incorporate longitudinal LLMs evaluations and training to develop models delivering accurate, accessible information, tailored to users' literacy levels.
2026
accuracy
large language models
oral lichen planus
oral potentially malignant disorders
patient education
readability
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14085/60741
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact