Abstract:
Objective: To explore the value of computed tomography (CT) radiomics machine learning classification models for predicting the staging of non-small cell lung cancer (NSCLC). Methods: We downloaded the Lung 1 dataset from the Cancer Imaging Database (TCIA), selected 291 eligible cell lung cancer patients, and divided them into two groups: Group 1 (Stage I and II) and Group 2 (Stage III and IV). We extracted
1037 radiomics features from each lesion and used the t-test and least absolute shrinkage and selection operator (LASSO) algorithm for feature selection. The CT signs of the lesions were screened using t-tests and chi-squared tests. The model was trained, and a prediction model was established using five machine learning classifiers: Logistic regression, random forest, Gaussian NB, support vector machine, and AdaBoost. The performance of the five prediction models was evaluated using receiver operating characteristic (ROC) curves, and the optimal model was selected. Finally, external validation was conducted using data acquired from 91 patients at our hospital. Results: After feature screening, 13 radiomics features with high diagnostic value were obtained for the subsequent establishment of an NSCLC patient staging prediction model. Among the five machine-learning classification models, the Random Forest classification prediction model was the best. The validation set AUC value using this model was the highest at 0.740. After external verification, the model exhibited an AUC value of 1.000, sensitivity of 1.000, and specificity of 1.000 in the training set. The AUC value of the test set was 0.698, with a sensitivity of 0.873 and a specificity of 0.500. In the CT morphological features of the case, except for the size of the lesion, there were no statistically significant differences in other features among the patients at different stages. Conclusion: The CT radiomics machine learning classification model can predict the staging of patients with NSCLC.