基于血清标志物等临床特征的机器学习模型在浸润性乳腺癌腋窝淋巴结转移预测中的应用研究

Predicting axillary lymph node metastasis in invasive breast cancer using machinelearning models based on serum biomarkers and other clinical features

  • 摘要:
    目的 本研究旨在结合血清生物标志物(serum tumor markers,STMs)和临床病理因素,预测浸润性乳腺癌患者腋窝淋巴结转移(axillary lymph node metastasis,ALNM)风险。
    方法 收集2015年1月至2019年12月在新疆医科大学附属肿瘤医院接受诊治的3 360例患者的11个临床特征资料,采用5种机器学习(machine learning,ML)算法构建ALNM预测模型,并通过曲线下面积(area under curve,AUC)、准确度、Kappa值和Brier评分比较模型性能,并将表现最好的模型与基于逻辑回归(Logistic regression,LR)构建的列线图进行比较,以确定最终的模型。最后,根据确定的最终模型的夏普利加性解释(Shapley additive explanations,SHAP)值,对影响ALNM的危险因素进行重要性排序。
    结果 极限梯度增强(eXtreme gradient boosting,XGBoost)模型展现出最佳的预测性能(AUC=0.769,准确度=0.735,Kappa=0.450),并在训练和验证集上均优于传统基于LR的列线图[训练集AUC和Brier评分为0.822(0.810~0.820)vs. 0.742(0.721~0.763),0.170(0.163~0.177)vs. 0.197(0.189~0.204 );验证集AUC和Brier评分为0.769(0.740~0.770)vs. 0.747(0.716~0.779),0.190(0.178~0.202) vs. 0.195(0.189~0.204),最终确定XGBoost为本研究的最佳模型。SHAP值分析显示,影响ALNM的前四位因素为肿瘤分期、年龄、分子分型和CEA水平。
    结论 基于STMs和临床特征的XGBoost模型能较为准确地预测浸润性乳腺癌ALNM风险,其性能优于传统模型,且肿瘤分期是最关键的预测因素。

     

    Abstract:
    Objective  Serum tumor markers (STMs) are important indicators associated with metastasis in patients with breast cancer (BC). This study focuses on predicting the risk of axillary lymph node metastasis (ALNM) in patients with invasive BC in Xinjiang by combining STMs and clinicopathological factors.
    Methods Data from 3,360 patients diagnosed with invasive BC and treated at the Affiliated Cancer Hospital of Xinjiang Medical University between 2015 and 2019 were analyzed, focusing on 11 relevant demographic and clinical factors. Five machine learning (ML) algorithms were used to develop predictive models for ALNM. Their performance was compared using metrics such as area under the curve (AUC), accuracy, Kappa value, and Brier score. The best-performing model was then compared with a nomogram based on Logistic regression (LR) to determine the final model. Shapley additive explanations (SHAP) values were used to rank the importance of factors contributing to ALNM.
    Results Of the 3,266 patients studied, 1,368 (41.89%) developed ALNM. Among the five constructed ML models, eXtreme gradient boosting (XGBoost) demonstrated the best predictive performance with an AUC of 0.768, an accuracy of 0.735, and a Kappa value of 0.450. In both the training and validation sets, the XGBoost model outperformed the LR-based nomogram (training set AUC and Brier score: 0.822 (0.810~0.820) vs. 0.742 (0.721~0.763) , 0.170 (0.163~0.177) vs. 0.197 (0.189~0.204) ; validation set AUC and Brier score: 0.769 (0.740~0.770) vs. 0.747 (0.716~0.779) , 0.190 (0.178~0.202) vs. 0.195 (0.189~0.204)). Therefore, XGBoost was selected as the final predictive model. SHAP analysis identified T stage, age, molecular subtype, and CEA level as the four most influential factors for ALNM prediction.
    Conclusions The XGBoost model effectively predicts the risk of ALNM in patients with invasive BC based on STMs and clinicopathological features, outperforming traditional nomograms. SHAP analysis highlighted T stage as the most critical factor influencing ALNM.

     

/

返回文章
返回