Accuracy and Completeness of CHATGPT-Generated Information on interceptive orthodontics: A multicenter collaborative study

Hatia A; Doldo T; Parrini S; Chisci E; Cipriani L; Montagna L; LAGANA G; Guenza G; Agosta E; Vinjolli F; Hoxha M; D'Amelio C; Favaretto N; Chisci G

Background: this study aims to investigate the accuracy and completeness of ChatGPT in answering questions and solving clinical scenarios of interceptive orthodontics. Materials and Methods: ten specialized orthodontists from ten Italian postgraduate orthodontics schools devel- oped 21 clinical open-ended questions encompassing all of the subspecialities of interceptive ortho- dontics and 7 comprehensive clinical cases. Questions and scenarios were inputted into ChatGPT4, and the resulting answers were evaluated by the researchers using predefined accuracy (range 1–6) and completeness (range 1–3) Likert scales. Results: For the open-ended questions, the overall me- dian score was 4.9/6 for the accuracy and 2.4/3 for completeness. In addition, the reviewers rated the accuracy of open-ended answers as entirely correct (score 6 on Likert scale) in 40.5% of cases and completeness as entirely correct (score 3 n Likert scale) in 50.5% of cases. As for the clinical cases, the overall median score was 4.9/6 for accuracy and 2.5/3 for completeness. Overall, the re- viewers rated the accuracy of clinical case answers as entirely correct in 46% of cases and the com- pleteness of clinical case answers as entirely correct in 54.3% of cases. Conclusions: The results showed a high level of accuracy and completeness in AI responses and a great ability to solve diﬃ- cult clinical cases, but the answers were not 100% accurate and complete. ChatGPT is not yet so- phisticated enough to replace the intellectual work of human beings.