Abstract:
Objective: To evaluate the effectiveness of the use of a general large language model (Nano AI Deepseek-R1-Full Version) in generating clinical diagnoses, based on descriptive texts of computed tomography of the chest. Methods: One-hundred and one cases with multiple and varied pulmonary lesions from CT scans were selected, including infectious pneumonia (bacterial, fungal, or viral) (n=23), tuberculosis (n=22), fibrosis (n=20), pulmonary edema (n=18), allergic pneumonia (n=7), and rare cases (n=11), including GPA (n=1), metastatic tumor (n=1), PCP (n=1), ABPA (n=1), PAP (n=2), lymphoma (n=2), and mucinous adenocarcinoma (n=3). The input texts were assigned to Condition A (imaging description only) or Condition B, which included medical history, lab test results, and imaging descriptions. The model provided five ranked diagnostic and differential diagnostic suggestions based on likelihood. Using the final clinical diagnosis as the gold standard, the agreement rates for the TOP1, TOP3, and TOP5 diagnoses were calculated by the investigators, as well as the Likert scores and Kappa coefficients under different conditions. Results: Under Condition A, the agreement rates for TOP1, TOP3, and TOP5 were 70%, 92%, and 100%, respectively (Kappa coefficients, 0.63, 0.80, and 1.00, respectively); under Condition B, the agreement rates were 85%, 97%, and 100%, respectively (Kappa coefficients, 0.81, 0.93, and 1.00, respectively). Conclusion: General large language models can generate probability-sorted suggestions for differential diagnosis based on text descriptions of CT of the chest, and the integration of information such as medical history and laboratory test results can significantly improve the accuracy of the TOP1 diagnosis.