Abstract:
Objective Serum tumor markers (STMs) are important indicators associated with metastasis in patients with breast cancer (BC). This study focuses on predicting the risk of axillary lymph node metastasis (ALNM) in patients with invasive BC in Xinjiang by combining STMs and clinicopathological factors.
Methods Data from 3,360 patients diagnosed with invasive BC and treated at the Affiliated Cancer Hospital of Xinjiang Medical University between 2015 and 2019 were analyzed, focusing on 11 relevant demographic and clinical factors. Five machine learning (ML) algorithms were used to develop predictive models for ALNM. Their performance was compared using metrics such as area under the curve (AUC), accuracy, Kappa value, and Brier score. The best-performing model was then compared with a nomogram based on Logistic regression (LR) to determine the final model. Shapley additive explanations (SHAP) values were used to rank the importance of factors contributing to ALNM.
Results Of the 3,266 patients studied, 1,368 (41.89%) developed ALNM. Among the five constructed ML models, eXtreme gradient boosting (XGBoost) demonstrated the best predictive performance with an AUC of 0.768, an accuracy of 0.735, and a Kappa value of 0.450. In both the training and validation sets, the XGBoost model outperformed the LR-based nomogram (training set AUC and Brier score: 0.822 (0.810~0.820) vs. 0.742 (0.721~0.763) , 0.170 (0.163~0.177) vs. 0.197 (0.189~0.204) ; validation set AUC and Brier score: 0.769 (0.740~0.770) vs. 0.747 (0.716~0.779) , 0.190 (0.178~0.202) vs. 0.195 (0.189~0.204)). Therefore, XGBoost was selected as the final predictive model. SHAP analysis identified T stage, age, molecular subtype, and CEA level as the four most influential factors for ALNM prediction.
Conclusions The XGBoost model effectively predicts the risk of ALNM in patients with invasive BC based on STMs and clinicopathological features, outperforming traditional nomograms. SHAP analysis highlighted T stage as the most critical factor influencing ALNM.