ChatGPT vs. Orthopedic Residents! Who Is The Winner?

Authors

  • Semih Yas Gazi University, Faculty of Medicine, Department of Orthopedics and Traumatology, Ankara, Turkey
  • Asim Ahmadov a:1:{s:5:"en_US";s:31:"Gazi University Medical faculty";}
  • Alimcan Baymurat Gazi University, Faculty of Medicine, Department of Orthopedics and Traumatology, Ankara, Turkey
  • Mehmet Ali Tokgoz Gazi University, Faculty of Medicine, Department of Orthopedics and Traumatology, Ankara, Turkey
  • Secdegul Coskun Yas Ankara Training and Research Hospital, Department of Emergency Medicine, Ankara, Turkey
  • Mustafa Odluyurt Caycuma State Hospital, Department of Orthopedics and Traumatology, Zonguldak, Turkey
  • Tolga Tolunay Gazi University, Faculty of Medicine, Department of Orthopedics and Traumatology, Ankara, Turkey

Keywords:

chatgpt, ai, examination, orthopedics, traumatology

Abstract

Abstract

Introduction: In recent advancements in artificial intelligence, ChatGPT by OpenAI has emerged as a versatile tool capable of various tasks, yet its application in medicine is challenged by complexities and limitations in accuracy. This article aims to compare ChatGPT's performance with orthopedic residents at Gazi University in a multiple-choice exam to assess its applicability and reliability in the field of orthopedics.

Materials and Methods: In this observational study at Gazi University, 31 orthopedic residents were stratified by experience level and assessed using a 50-question multiple-choice test on various orthopedic topics. The study also evaluated ChatGPT 3.5's responses to the same questions, focusing on both the correctness and reasoning behind the answers, with data analysis performed using IBM SPSS.

Results: Orthopedic residents tested, ranging from 6 months to 5 years in experience, scored between 23 and 40 out of 50 in a multiple-choice exam, with an average score of 30.81, varying by seniority. ChatGPT provided correct answers for 25 out of 50 questions, showing consistency in different languages and times, but also exhibited limitations by giving incorrect responses or stating that the correct answer was not among the choices for some questions.
Discussion: The study evaluates ChatGPT's capabilities in the medical field, finding it comparable to orthopedic residents with up to two years of experience in a multiple-choice exam setting, but limited in interpretation and source reliability. ChatGPT 3.5 was prone to "hallucinations”, raising concerns about AI's use in academic and clinical settings. Despite these limitations, the potential for ChatGPT's role in healthcare exists, but improvements are needed to ensure accuracy and ethical application, with future research focusing on targeted AI training and hybrid decision-making models.

Conclusion: The study concludes that while ChatGPT can accurately answer some theoretical questions, its effectiveness is limited in interpretive scenarios and in situations with multiple variables, though its accuracy may improve with personalized updates over time.

Downloads

Published

01.04.2024

Issue

Section

Original Research

Most read articles by the same author(s)