Abstract:
Objective: To compare the performance of four cutting-edge domestic large language models (LLMs) in identifying intracranial hemorrhage on head computed tomography (CT) and to explore the applicability of general-purpose LLMs in medical imaging. Materials and Methods: CT images of intracranial hemorrhage were retrospectively collected from our hospital between July 2025 and September 2025. These images were uploaded to four LLMs: DeepSeek-V3.2 Yuanbao Edition (DeepSeek), Doubao-seed-1.6 (Doubao), Qwen-VL-Max (Qwen), and ERNIE-X1.1 (ERNIE). Three core questions were asked sequentially for each image, as follows: imaging technique, presence of hemorrhage, and hemorrhage type. A new conversation was initiated for each case to avoid context interference, and the inquiry was repeated one week later. Thereafter, the answers were recorded and subjected to statistical analysis. Results: A total of 102 intracranial hemorrhage cases and 102 matched normal CT scans were included. All LLMs achieved 100% accuracy for imaging technique recognition. For Q2 (presence of hemorrhage), Doubao demonstrated the best performance, with an accuracy of 91% and sensitivity of 83%, which significantly outperformed the other models (P < 0.001). Additionally, all models exhibited high specificity (98%~99%). For Q3 (hemorrhage type), Doubao again achieved the highest overall accuracy (67%) and the best sensitivity across hemorrhage subtypes: 19% for epidural hematoma, 98% for intracerebral hemorrhage, 33% for subarachnoid hemorrhage, and 71% for subdural hematoma. Moreover, all LLMs demonstrated higher sensitivity for intracerebral hemorrhage, but lower performance for the other three subtypes. In consistency testing, Doubao achieved the highest agreement for both Q2 (Kappa = 0.87) and Q3 (Kappa = 0.71), whereas the remaining models performed poorly in Q3. Conclusion: Although domestic LLMs have demonstrated preliminary capability in medical image interpretation, their performance and stability in detecting intracranial hemorrhage vary considerably. Among all the models studied, Doubao exhibited the best overall performance in identifying intracranial hemorrhage.