影像组学对非小细胞肺癌患者分期的机器学习预测模型研究

周洁; 郑燕婷; 江舒琪; 安杰; 邱士军; 陈淮

doi:10.15953/j.ctta.2024.186

影像组学对非小细胞肺癌患者分期的机器学习预测模型研究

Machine Learning Prediction Models for Staging of Non-small Cell Lung Cancer Patients Using Radiomics

摘要

摘要: 目的：探讨CT影像组学机器学习分类模型对非小细胞肺癌患者分期进行预测的价值。方法：从癌症影像数据库（TCIA）下载lung1数据集，选用符合条件的291例非小细胞肺癌患者的数据并将其分为两组，１组（Ⅰ、Ⅱ期）和２组（Ⅲ期、Ⅳ期）。从每个肿瘤病灶分别提取1037个影像组学特征，运用t检验、最小绝对收缩和选择算子（LASSO）算法进行特征筛选。病灶的CT形态学特征通过t检验和卡方检验进行筛选。采用Logistic回归、随机森林、高斯朴素贝叶斯、支持向量机、AdaBoost等5种机器学习分类器进行模型训练并建立预测模型，用受试者工作特征曲线（ROC）来评价这些预测模型的效能并选出最优模型。最后使用在本院收集的91例患者数据进行外部验证。结果：本研究在特征筛选后，得到13个具有较高诊断价值的影像组学特征用于后续建立NSCLC患者分期预测模型。在5种机器学习分类模型中，随机森林分类预测模型是最佳模型，运用此模型的验证集AUC值最高，为0.740。经过外部验证，该模型在训练集AUC值为1.000，敏感度为1.000，特异度为1.000；测试集AUC值为0.698，敏感度为0.873，特异度为0.500。病例的CT形态学特征里面，除了病灶大小之外，其它特征在不同分期患者的差异无统计学意义。结论：CT影像组学机器学习分类模型对NSCLC患者的分期有一定的预测能力。

Abstract: Objective: To explore the value of computed tomography (CT) radiomics machine learning classification models for predicting the staging of non-small cell lung cancer (NSCLC). Methods: We downloaded the Lung 1 dataset from the Cancer Imaging Database (TCIA), selected 291 eligible cell lung cancer patients, and divided them into two groups: Group 1 (Stage I and II) and Group 2 (Stage III and IV). We extracted 1037 radiomics features from each lesion and used the t-test and least absolute shrinkage and selection operator (LASSO) algorithm for feature selection. The CT signs of the lesions were screened using t-tests and chi-squared tests. The model was trained, and a prediction model was established using five machine learning classifiers: Logistic regression, random forest, Gaussian NB, support vector machine, and AdaBoost. The performance of the five prediction models was evaluated using receiver operating characteristic (ROC) curves, and the optimal model was selected. Finally, external validation was conducted using data acquired from 91 patients at our hospital. Results: After feature screening, 13 radiomics features with high diagnostic value were obtained for the subsequent establishment of an NSCLC patient staging prediction model. Among the five machine-learning classification models, the Random Forest classification prediction model was the best. The validation set AUC value using this model was the highest at 0.740. After external verification, the model exhibited an AUC value of 1.000, sensitivity of 1.000, and specificity of 1.000 in the training set. The AUC value of the test set was 0.698, with a sensitivity of 0.873 and a specificity of 0.500. In the CT morphological features of the case, except for the size of the lesion, there were no statistically significant differences in other features among the patients at different stages. Conclusion: The CT radiomics machine learning classification model can predict the staging of patients with NSCLC.

HTML全文

参考文献(26)

施引文献

资源附件(0)