ISSN 1004-4140
    CN 11-3017/P

    基于通用大语言模型的胸部CT文本描述自动生成诊断效能评价

    Automatically Generated Diagnoses from Text Descriptions of Computed Tomography of the Chest Based on a General Large Language Model

    • 摘要:
      目的 评估应用通用大语言模型(纳米AI Deepseek-R1-满血版)根据胸部CT描述文本生成临床诊断的效能。
      方法 选取101例具有多发、多形态肺部病变的CT影像资料,病例包括感染性肺炎(细菌、真菌、病毒)23例,肺结核22例,纤维化20例,肺水肿18例,过敏性肺炎7例,以及11例少见病例(GPA 、转移瘤、PCP、ABPA各1例,PAP2例,淋巴瘤 2例,黏液腺癌 3例)。输入文本分为条件A(仅影像描述)和条件B(结合病史、实验室检查及影像描述),模型据此提供五个按可能性排名的诊断和鉴别诊断建议。以最终临床诊断为金标准,研究者统计TOP1、TOP3及TOP5诊断符合率,并计算不同条件下的Likert评分和Kappa系数。
      结果 在条件A下,TOP1、TOP3、TOP5符合率分别为70%、92%、100%(Kappa系数分别为0.63、0.80、1.00);在条件B下,符合率分别为85%、97%、100%(Kappa系数分别为0.81、0.93、1.00)。
      结论 通用大语言模型能够基于胸部CT文本描述生成按概率排序的鉴别诊断建议,且整合病史、实验室检查等信息可显著提高TOP1诊断的准确性。

       

      Abstract: Objective: To evaluate the effectiveness of the use of a general large language model (Nano AI Deepseek-R1-Full Version) in generating clinical diagnoses, based on descriptive texts of computed tomography of the chest. Methods: One-hundred and one cases with multiple and varied pulmonary lesions from CT scans were selected, including infectious pneumonia (bacterial, fungal, or viral) (n=23), tuberculosis (n=22), fibrosis (n=20), pulmonary edema (n=18), allergic pneumonia (n=7), and rare cases (n=11), including GPA (n=1), metastatic tumor (n=1), PCP (n=1), ABPA (n=1), PAP (n=2), lymphoma (n=2), and mucinous adenocarcinoma (n=3). The input texts were assigned to Condition A (imaging description only) or Condition B, which included medical history, lab test results, and imaging descriptions. The model provided five ranked diagnostic and differential diagnostic suggestions based on likelihood. Using the final clinical diagnosis as the gold standard, the agreement rates for the TOP1, TOP3, and TOP5 diagnoses were calculated by the investigators, as well as the Likert scores and Kappa coefficients under different conditions. Results: Under Condition A, the agreement rates for TOP1, TOP3, and TOP5 were 70%, 92%, and 100%, respectively (Kappa coefficients, 0.63, 0.80, and 1.00, respectively); under Condition B, the agreement rates were 85%, 97%, and 100%, respectively (Kappa coefficients, 0.81, 0.93, and 1.00, respectively). Conclusion: General large language models can generate probability-sorted suggestions for differential diagnosis based on text descriptions of CT of the chest, and the integration of information such as medical history and laboratory test results can significantly improve the accuracy of the TOP1 diagnosis.

       

    /

    返回文章
    返回