Development and Validation of a 28-Day Mortality Prediction Model for Patients with Sepsis Complicated by Autoimmune Diseases Using Two Machine Learning Methods

Zhiyang Wang,^1,^2,^* Xin Xiao,^1,^* Shifeng Li,^1,^* Jiachen He,¹ Yanou Li,¹ Fang Huang,¹ Jun Wang¹

¹Department of Critical Care Medicine, The First Affiliated Hospital of Soochow University, Suzhou, 215006, People’s Republic of China; ²Department of Emergency and Critical Care Medicine, The Second Affiliated Hospital of Soochow University, Suzhou, 215006, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Jun Wang, Email [email protected] Fang Huang, Email [email protected]

Objective: This study aimed to develop a prognostic model for patients with sepsis complicated by autoimmune diseases using machine learning methods and validate the model.
Methods: Data on patients with sepsis and autoimmune diseases were extracted from the MIMIC-IV database. Participants were randomly divided into training set and validation set according to the ratio of 7:3. The predictors were selected by using LASSO regression analysis and the Boruta algorithm which affect the 28-day prognosis of patients. A nomogram was developed based on independent risk factors identified by logistic regression for 28-day prognosis and was internally and externally validated using calibration curves and DCA. Based on nomogram scores, patients were stratified into high- and low-score groups, with KM analysis demonstrating significant differences in mortality between the cohorts.
Results: A total of 1,481 patients from the MIMIC-IV database met inclusion criteria and an external validation set included 57 patients from the Department of Critical Care Medicine of the First Affiliated Hospital of Soochow University. Ten overlapping predictors (Age, Gender, BMI, WBC, BUN, PT, APTT, history of cerebrovascular disease, history of liver disease, and CRRT) were identified by LASSO and Boruta algorithms and were subsequently confirmed as statistically significant independent risk factors through logistic regression. The prediction model built by the ten predictors showed superior predictive performance compared to the SOFA score in training (AUC=0.772), internal validation (AUC=0.771), and external validation cohorts (AUC=0.787). Hosmer-Lemeshow tests and calibration curves indicated strong agreement between predicted outcomes and actual observations across all cohorts, and DCA suggested significant clinical utility. The KM curve shows that the mortality rate of the high-score group is significantly higher than that of the low-score group.
Conclusion: A prognostic model for predicting 28-day mortality in sepsis patients with autoimmune diseases demonstrated robust predictive performance and clinical applicability upon internal and external validation.

Keywords: sepsis, LASSO regression, Boruta algorithm, dynamic nomogram, MIMIC-IV

Introduction

Sepsis is a life-threatening condition resulting from a dysregulated host response to infection and is associated with high global mortality.^1,2 Studies have shown that sepsis affects approximately 48.9 million individuals and causes 11 million deaths annually, accounting for 19.7% of all global deaths.³ The pathophysiological mechanisms of sepsis are highly complex. Currently, it is believed that intricate interactions occur within the immune system throughout the progression of sepsis, involving both activation and dysregulation of innate and adaptive immunity.⁴ Studies have reported that the development and progression of sepsis are often accompanied by a combination of excessive inflammation and immunosuppression.^5,6

Autoimmune diseases are a group of chronic inflammatory conditions that affect joints, soft tissues, and internal organs. Their pathogenesis is also closely associated with immune system dysfunction.^7–9 Currently autoimmune diseases mainly include rheumatoid arthritis, systemic lupus erythematosus, scleroderma, ankylosing spondylitis, and others. In sepsis, infection-induced excessive inflammatory response can lead to the dysfunction of immune cells, thereby causing multiple organ damage.¹⁰ In autoimmune diseases, the immune system erroneously attacks the body’s own tissues, leading to chronic inflammation and tissue damage. At the same time, treatments for autoimmune diseases often involve the use of hormones and immunosuppressive drugs, which may further exacerbate the damage of immune system. Therefore, sepsis and autoimmune diseases are closely related in the process of occurrence and development. They may influence each other, worsen the condition, and increase treatment difficulty.^11,12 Multiple studies have shown that patients with sepsis complicated by autoimmune diseases have higher mortality, more severe conditions, and poorer prognoses.^13–15 In addition, researchers have pointed out that there is significant heterogeneity among sepsis patients. Studying subgroups of sepsis patients with different underlying diseases, immune backgrounds, and infection sites is particularly important for their treatment and prognosis.^16–18 Currently, the commonly used scoring systems for critically ill patients, such as the SOFA score and the APACHE II score, although having strong clinical practicability, lack individualized characteristics and are difficult to achieve precise prognosis prediction for different subgroups of sepsis. Moreover, there is still a severe lack of prognosis prediction models for patients with sepsis combined with immune system diseases. Thus, in clinical practice, it is essential for doctors to fully understand this relationship, promptly assess the prognosis of sepsis patients with coexisting autoimmune diseases, and adopt more aggressive treatment strategies at an early stage to improve patient survival and quality of life.

Based on these considerations, the present study aimed to use machine learning–based methods to identify key prognostic factors and to construct a prediction model for the timely assessment of 28-day mortality in patients with sepsis complicated by autoimmune diseases. We employed both LASSO regression and the Boruta algorithm for variable selection, leveraging their complementary methodological strengths. LASSO, as an embedded method, provides a sparse and interpretable set of predictors within a linear framework, whereas Boruta, a random forest–based wrapper method, can robustly identify all features related to the outcome, including both linear and nonlinear effects. In the context of critical care, where data are high-dimensional, variables are often correlated, and relationships may be nonlinear, this hybrid feature selection strategy helps to build a model that is both robust and parsimonious. We determined the final set of predictors by taking the consensus of the two methods, thereby enhancing the credibility of the selected variables and preserving good predictive performance together with clinical interpretability. Specifically, we first developed a prognostic model using the large MIMIC-IV ICU database and selected the nomogram with the best predictive performance, and then externally validated this model using clinical data from our own center. To the best of our knowledge, this is the first study to develop and externally validate a 28-day mortality prediction nomogram specifically for patients with sepsis complicated by autoimmune diseases based on large ICU datasets, and to systematically compare its predictive performance with that of the SOFA score.

Materials and Methods

Data Sources and Patient Population

This study utilized data from the MIMIC IV 3.1 database as both the training set and internal validation set. The MIMIC IV database is a comprehensive dataset comprising electronic health records of over 50,000 patients admitted to the intensive care unit (ICU) at Beth Israel Deaconess Medical Center in Boston, Massachusetts, between 2008 and 2019.¹⁹ The Institutional Review Board (IRB) of Beth Israel Deaconess Medical Center granted a waiver for informed consent and approved the dissemination of research data. The author (WZY) obtained authorized access to the database (certificate number: 10518271). Additionally, we retrospectively collected clinical data from patients diagnosed with sepsis and admitted to the ICU of the First Affiliated Hospital of Soochow University between July 2022 and July 2024 for external validation. This external dataset received ethical approval from the Medical Ethics Committee of the First Affiliated Hospital of Soochow University (Ethics Number: 2025459).

We selected patients diagnosed with sepsis from the MIMIC database based on the diagnostic criteria outlined in the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3): suspected infection and a Sequential Organ Failure Assessment (SOFA) score ≥ 2. Subsequently, we identified patients with underlying autoimmune diseases, including systemic lupus erythematosus (SLE), rheumatoid arthritis, systemic sclerosis, psoriasis, ankylosing spondylitis, vasculitis, idiopathic inflammatory myopathy, autoimmune liver disease, and inflammatory bowel disease (Supplementary Table S1). To refine the study cohort, patients younger than 18 years, those with repeated ICU admissions, and those with an ICU length of stay less than 24 hours were excluded, resulting in the final study population. Concurrently, we retrospectively collected clinical data from 494 septic patients admitted to the ICU of the First Affiliated Hospital of Soochow University between July 2022 and July 2024. The inclusion criteria were as follows: (1) age > 18 years; (2) presence of autoimmune diseases; (3) first ICU admission; and (4) ICU stay exceeding 24 hours. Ultimately, 57 patients with sepsis complicated by autoimmune diseases were included in the external validation cohort. The specific screening criteria are shown in Figure 1.

Figure 1 Flow chart of the patient selection and grouping.

Data Collection

Based on clinical expertise, published literature, and data documentation in the MIMIC-IV database using structured query language (SQL) with Navicat Premium, we collected the following seven categories of information.(1) Demo graphic details of the patient, such as Gender, Age, and Weight; (2) The following should be assessed within 24h of being admitted to the ICU: essential signs including heart rate(HR), Mean blood Pressure (MBP), respiratory rate (RR) and temperature; (3) laboratory test outcomes within 24h after admission to the ICU, such as PH, FiO_2, partial pressure of oxygen in arterial blood (PaO₂), carbon dioxide partial pressure(PaCO₂), white blood cell (WBC), hemoglobin, platelet, red blood cell (RBC),Red Distribution Width (RDW), blood urea nitrogen (BUN), creatinine, chloride, calcium, sodium, potassium, glucose, prothrombin time (PT), activated partial thromboplastin time (APTT) and international normalized ratio (INR); (4) treatment status within 24h after entering the ICU, such as whether mechanical ventilation was performed and renal replacement treatment (CRRT); (5) basic illnesses like diabetes, chronic obstructive pulmonary disease (COPD), Chronic kidney disease (CKD), Cerebrovascular disease, Liver disease and others; (6) outcome: ICU 28d mortality (7) Sequential Organ Failure Assessment (SOFA) score within 24h after ICU admission.

Statistical Analysis

As the current study is a retrospective analysis, no sample size calculations were conducted. Variables with a missing data rate exceeding 40% were excluded, and for those with less than a missing data rate below 40%, multiple imputation was employed. Categorical variables were described as percentiles (%); continuous variables of non normal distribution were displayed as medians and quartiles, and continuous variables of normal distribution were expressed as mean and standard deviation (mean (S.E).). The chi-square test was used to compare the differences between categorical variables, and the t-test or nonparametric test was used to compare the differences between two groups of continuous variables. Feature selection was performed in the training cohort using both LASSO (Least Absolute Shrinkage and Selection Operator) regression and Boruta algorithm. Subsequently, logistic regression analysis was employed to identify significant risk factors associated with clinical outcomes. These final predictive factors were then incorporated into a nomogram for visual representation of the risk prediction model.The area under the ROC curve (AUC) was used to assess the prediction accuracy of the mo calibration curve was used to assess the consistency between the predicted value of the model and the actual value, and decision curve was used to analyze the clinical benefits of the model.The Kaplan-Meier (KM) curve shows the mortality rate between the two groups. Tableone software package was used for data description; glmnet software package was used for LASSO regression analysis; boruta software package was used for Boruta analysis. Rms software package was used for plotting the nomogram and calibration curve, and pROC software package was used for plotting ROC curve. R 4.3.0 (https://www.r-project.org) was used for all statistical analysis. A two-sided P value < 0.05 was considered statistically significant. This study was designed and analyzed with reference to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) statement.

Results

Characteristics of the Study Cohort

A total of 1,481 patients with sepsis complicated by autoimmune diseases were ultimately included. 1,016 allocated to the training set and 465 to the internal validation set. Table 1 summarizes the demographic and clinical data of the study cohort. The variables in the training cohort and validation cohort were comparable, with almost no statistical differences (P>0.05).

Table 1 Characteristics Description of Patients

Predictor Selection

In the training cohort, 32 covariates included in the study were analyzed by using LASSO regression to screen for outcome-related factors (Figure 2a and b), and a total of eleven influencing factors were identified: age, gender, BMI, white blood cell (WBC), blood urea nitrogen (BUN), calcium, prothrombin time (PT), activated partial thromboplastin time (APTT), history of Cerebrovascular disease, history of Liver disease and renal replacement treatment (CRRT). The Boruta algorithm, based on random forests, evaluates the importance of feature variables by comparing the original features with randomly generated shadow features. We simultaneously applied the Boruta algorithm for initial feature selection (Figure 3). In the figure, green represents important features included in the model to enhance predictive power; red indicates features deemed unimportant and thus excluded from consideration; yellow denotes features with uncertain importance requiring further investigation; blue represents shadow features used for comparison but not for model training. After applying LASSO regression and Boruta algorithm for feature selection and combining with basic demographic characteristics, ten common predictive variables were finally determined: age, gender, body mass index (BMI), white blood cell count (WBC), blood urea nitrogen (BUN), prothrombin time (PT), activated partial thromboplastin time (APTT), history of cerebrovascular disease, history of liver disease, and renal replacement therapy (CRRT). At the same time, Logistic regression analysis indicated that increasing age, a lower BMI, female gender, prolonged activated partial thromboplastin time (APTT) and prothrombin time (PT), elevated blood urea nitrogen (BUN) levels, increased white blood cell count, a history of prior cerebrovascular or liver disease, and the initiation of continuous renal replacement therapy (CRRT) within 24 hours were all independent risk factors for 28-day mortality in patients with sepsis complicated by autoimmune diseases (Figure 4).

Figure 2 Variable selection using the LASSO model for binary logistic regression (a) Coefficient paths of different variables using the LASSO model: four variables with nonzero coefficients were chosen by the optimal lambda. (b) Cross-validation plot with 1SE bounds using the LASSO model: the left and right dotted vertical lines represent the values of log (lambda. min) and log (lambda.1se), respectively. Following validation of the optimal parameter (lambda) in the LASSO model, we plotted the partial likelihood deviance (binomial deviance) curve versus log(lambda) and drew dotted vertical lines based on 1 standard error criteria. 10-fold cross-validation was conducted in the LASSO regression.

Figure 3 Feature selection based on the Boruta algorithm. The horizontal axis is the name of each variable, and the vertical axis is the Z-value of each variable. The box plot shows the Z-value of each variable during model calculation. In the figure, green indicates confirmed important features, red indicates rejected features, yellow indicates tentative features that require further confirmation.

Figure 4 Multivariate Logistic Regression Analysis of the Prediction model in Training Cohort.

Nomogram Construction

Based on the combined selection results of LASSO regression and the Boruta algorithm, the following variables were finally included in the model: age, gender, BMI, white blood cell count (WBC), blood urea nitrogen (BUN), prothrombin time (PT), activated partial thromboplastin time (APTT), history of cerebrovascular disease, history of liver disease, and renal replacement therapy (CRRT). Then a nomogram was constructed by using these variables (Figure 5). Compared with the SOFA score, the prediction model showed significantly better performance. In the training cohort (Figure 6a), the model had an AUC of 0.772 (95% CI: 0.734–0.809), which was clearly higher than the SOFA score (AUC = 0.664, Z = 4.009, P < 0.001). In both internal and external validation cohorts (Figure 6b and c), the prediction model also significantly outperformed the SOFA score (internal validation cohort: AUC = 0.771 (95% CI: 0.715–0.828) vs AUC = 0.664 (95% CI: 0.564–0.701), Z = 3.536, P < 0.001; external validation cohort: AUC = 0.787 (95% CI: 0.651–0.922) vs AUC = 0.679 (95% CI: 0.536–0.823), Z = 1.021, P = 0.307). These findings indicated that the prediction model has strong predictive capability.

Figure 5 Nomogram for predicting 28-day mortality in sepsis patients. Each predictor (PT, CRRT, APTT, cerebrovascular disease, gender, liver disease, WBC, BMI, BUN, and age) corresponds to a point value on the “Points” axis. The blue boxes and red dots indicate an example patient’s values (or category levels), the vertical dotted lines show the projection to the “Points” axis, and the arrows illustrate the mapping from “Total points” to the predicted probability (eg, total points = 350, predicted probability = 0.102).

Figure 6 Receiver operating characteristic (ROC) curves for the Prediction model and SOFA model. (a) Training cohort; (b) Internal Validation cohort; (c) External Validation cohort.

Predictive Performance Evaluation

According to the Hosmer-Lemeshow goodness-of-fit test, there was no statistically significant difference between the predicted and actual 28-day mortality rates in sepsis patients (training cohort: X ² = 2.3658, degrees of freedom = 8, P = 0.9677 > 0.05; internal validation cohort: X ² = 10.884, degrees of freedom = 8, P = 0.2083 > 0.05; external validation cohort: X ² = 13.581, degrees of freedom = 8, P = 0.0934 > 0.05). After 1,000 iterations of Bootstrap internal validation, the absolute mean error was 0.011 in the training cohort (n = 1,016), 0.019 in the internal validation cohort (n = 465), and 0.038 in the external validation cohort (n = 57), indicating that the predicted 28-day mortality curves for sepsis patients remained in good concordance with the actual clinical curves (Figure 7).

Figure 7 Calibration curves for the nomogram. (a) Training cohort; (b) Internal Validation cohort; (c) External Validation cohort.

Decision curve analysis (Figure 8) showed that the threshold probability of the model was 0.15–0.75 in the training cohort, 0.15–0.75 in the internal validation cohort and 0.10–0.70 in the external validation cohort. The use of the Prediction Model yielded a better net benefit compared to the SOFA scores, demonstrating its clinical utility.

Figure 8 DCA curve for the Prediction model, SOFA model. (a) Training cohort; (b) Internal Validation cohort; (c) External Validation cohort.

Risk Stratification Based on Nomogram Scores

Based on the nomogram constructed from the training cohort, a cut off value for the nomogram score was determined. And this threshold was applied to validate data in the internal validation cohort. By using this cut off value, patients in the internal validation cohort were divided into two groups: high-point and low-point group. Kaplan-Meier survival analysis showed that patients in the high-point group had significantly lower survival rates than those in the low-point group (Figure 9). This indicated that our nomogram has substantial clinical value.

Figure 9 The KM curve shows the mortality rate between the two groups.

Discussion

Sepsis remains the most common cause of admission to the ICU. It has always been a focus of research due to its complex mechanisms, high incidence, and high mortality. In recent years, numerous prediction models related to sepsis have emerged.^20–22 Many researchers have focused on the overall diagnosis, treatment, and prognosis of sepsis patients.²³ However, increasing studies have revealed considerable heterogeneity among sepsis patients,¹⁰ highlighting the urgent need to investigate subgroups of patients with different underlying diseases, immune backgrounds, and infection sites. Therefore, our study focused on patients with sepsis complicated by autoimmune diseases and employed both LASSO regression and the Boruta algorithm to jointly screen for prognostic factors and build a 28-day mortality prediction model. Internal and external validations confirmed the model’s good predictive performance.

Feature selection is an important issue related to feature engineering in machine learning. In clinical research, determining the predictive factors of a model is the most critical step in building a prediction model. The goal of feature selection is to find the optimal subset of outcome-related features and identify truly relevant variables to simplify the model. Also it aids in understanding the data-generating process. In this study, two machine learning algorithms were used together for the feature selection. LASSO regression, a commonly used dimensionality reduction method, has strong capabilities in processing large datasets and can avoid multicollinearity among covariates. The Boruta algorithm is a feature selection method based on random forests, which aims to identify truly important features from a given feature set and distinguish them from irrelevant ones. It performs well in handling high-dimensional and nonlinear data and can be integrated with other models to improve predictive performance. Therefore, this study used both LASSO regression and the Boruta algorithm to identify ten predictive factors: age, gender, BMI, WBC, BUN, PT, APTT, history of Cerebrovascular disease, history of Liver disease and renal replacement treatment (CRRT). The method was reliable and showed high predictive performance. Figure 4 shows that increasing age, a lower BMI, female gender, prolonged APTT and PT, elevated BUN levels, increased white blood cell count, a history of prior cerebrovascular or liver disease, and the initiation of CRRT within 24 hours were all independent risk factors for 28-day mortality in patients with sepsis complicated by autoimmune diseases. Meanwhile, these indicators have all been reported to be associated with sepsis prognosis. For example, the study by Martin GS et al demonstrated that increased age is an independent risk factor for sepsis mortality, and that elderly sepsis patients tend to die earlier during hospitalization compared with younger patients.²⁴ Numerous studies have demonstrated that elevated levels of inflammatory markers, including white blood cell count and C-reactive protein (CRP), are significantly associated with adverse outcomes in patients with sepsis.²⁵ In addition, consistent with our study, there have been reports that obese patients have higher survival rates in sepsis, and low BMI is an independent risk factor for poor prognosis in sepsis patients.^26–28 At the same time, patients with liver failure or chronic liver disease have a higher risk of developing sepsis than patients without endogenous liver disease.^29–31And increased BUN is reported as an independent risk factor for poor outcomes in critically ill patients and is associated with the development of sepsis-associated acute kidney injury.³² Prolonged APTT is demonstrated to be related to increased 28-day mortality in SA-AKI patients, indicating a poor prognosis.³³ Other selected indicators have also been confirmed in different studies to be associated with 28-day mortality in sepsis patients.^34,35

The SOFA score is a commonly used tool to assess prognosis and disease severity in critically ill patients. However, it still has some limitations in practical application.³⁶ For instance, The SOFA score is based on physiological parameters within the first 24 hours of admission and requires many test indicators across multiple organ systems, making it unsuitable for rapid scoring. Additionally, it can not accurately assess the prognosis of different subtypes of sepsis. Moreover, the SOFA score primarily focuses on organ dysfunction while ignoring other important prognostic factors for critically ill patients, such as age, underlying diseases, and nutritional status. In contrast, the prediction model we established for patients with sepsis complicated by autoimmune diseases incorporates multiple factors, including age, respiratory status, metabolic status, and coagulation status. It enables effective early prognosis evaluation and has better predictive value compared to the SOFA score. Compared with the SOFA score, our model showed better discrimination for 28-day mortality in patients with sepsis and autoimmune diseases while relying on fewer variables and without requiring repeated measurements.

In summary, through internal and external validation, our prediction model can provide a simple, rapid, and accurate prediction of 28-day mortality in patients with sepsis and autoimmune diseases. The Hosmer-Lemeshow test and calibration curves also confirmed the model’s accuracy, showing good concordance between the predicted and actual outcome curves. Decision curve analysis further demonstrated that, compared to the SOFA scores, the model has superior clinical utility. Our findings have several potential clinical implications. First, the nomogram can help clinicians rapidly estimate the 28-day mortality risk of patients with sepsis and autoimmune diseases early after ICU admission, using routinely collected data. This may support timely escalation of care, closer monitoring of high-risk patients, and informed discussions with patients and families. Second, risk stratification could be used to enrich future clinical trials in this subgroup by identifying patients at particularly high risk, thereby improving study efficiency. Third, by highlighting the prognostic importance of coagulation and renal parameters, our results underscore the need for careful management of organ dysfunction and early recognition of patients who may benefit from targeted interventions. Meanwhile we used a large, well-characterized ICU database (MIMIC-IV) for model development and internal validation and further tested the model in an independent external cohort. The predictors are objective, readily obtainable variables that do not require advanced biomarkers or imaging, which enhances feasibility in routine practice. The use of two machine learning–based feature selection methods, followed by conventional multivariable logistic regression, provides a balance between modern data-driven techniques and interpretability that is essential for clinical adoption.

However, this study has several limitations: ① Autoimmune diseases include a variety of conditions such as rheumatoid arthritis, lupus, and dermatomyositis. Glucocorticoids play a crucial role in the treatment of septic shock. Similarly, glucocorticoids are also used in the treatment of immune system diseases. However, whether the use of glucocorticoids in sepsis treatment can benefit patient survival has always been a research hotspot.^37–41 In this study, whether glucocorticoids were used in the early stage of ICU admission was not included in our predictive factors. This might be related to different subtypes of sepsis patients, different immune status sepsis patients, different pathogen infections, and the dosage of glucocorticoids. Due to the use of hormones and immunosuppressants in the treatment of autoimmune diseases, we were unable to obtain the dosage and frequency of hormone use accurately, which may cause bias into the study. ② This study made an early prediction of the 28-day prognosis of sepsis patients with concurrent immune system diseases based on their early laboratory indicators and physiological conditions, aiming to provide clinical evidence for early identification and intervention. However, this study did not involve the assessment of patients’ long-term survival status.③ This study was retrospective study. Some data in the database were missing. Although we applied effective imputation methods to address the missing data and indicators, some bias may still exist. ④ Although we performed external validation in an independent cohort, the number of patients in the external dataset was relatively small (n = 57), which may limit the precision of performance estimates and raises the possibility of overfitting. Larger, multi-center external validation studies are needed to confirm the generalizability of our findings. Second, the study is retrospective and observational, and unmeasured or residual confounding cannot be excluded. We relied on routinely available variables in the MIMIC-IV database, and some potentially relevant factors, such as specific autoimmune disease subtypes, disease activity scores, cumulative glucocorticoid exposure, and detailed immunosuppressive regimens, were not consistently recorded and therefore could not be incorporated into the model.

In summary, we developed and externally validated a simple, interpretable nomogram for predicting 28-day mortality in patients with sepsis complicated by autoimmune diseases. The model, derived using LASSO and Boruta-based feature selection, showed moderate but consistently better discriminatory ability than the SOFA score and demonstrated good calibration and clinical utility in decision curve analysis. Future work should focus on prospective, multi-center validation; incorporation of autoimmune disease–specific variables and dynamic clinical trajectories; and evaluation of whether model-guided management strategies can improve outcomes in this vulnerable patient population.

Conclusion

This model demonstrates high accuracy in predicting the 28-day mortality risk among patients with sepsis complicated by autoimmune diseases. The predictive factors were systematically and rigorously selected using two distinct machine learning approaches. Both internal and external validations have confirmed the model’s robustness, further supporting its potential for clinical application.

Data Sharing Statement

The datasets presented in the current study are available in the MIMIC IV 3.1 database (https://physionet.org/content/mimiciv/3.1/).

Ethics Approval and Consent to Participate

The MIMIC-IV database is a publicly available, de-identified database; therefore, separate ethical approval and informed consent were waived. The external dataset received ethical approval from the Medical Ethics Committee of the First Affiliated Hospital of Soochow University (Ethics Number: 2025459). The study was conducted in accordance with the principles of the Declaration of Helsinki.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This work is supported by the Science Foundation of Jiangsu Commission of Health (M2022086), the Key Discipline Project of Suzhou (szxk202503), the Suzhou Basic Research Project (SSD2024053), and the Natural Science Foundation of Jiangsu Province (BK20241798).

Disclosure

The authors declare no competing interests.

References

1. Angus DC, van der Poll T. Severe sepsis and septic shock. N Engl J Med. 2013;369(9):840–14. doi:10.1056/NEJMra1208623

2. Boomer JS, To K, Chang KC, et al. Immunosuppression in patients who die of sepsis and multiple organ failure. JAMA. 2011;306(23):2594–2605. doi:10.1001/jama.2011.1829

3. Li J, Shen L, Qian K. Global, regional, and national incidence and mortality of neonatal sepsis and other neonatal infections, 1990-2019. Front Public Health. 2023;11:1139832. doi:10.3389/fpubh.2023.1139832

4. van der Poll T, Shankar-Hari M, Wiersinga WJ. The immunology of sepsis. Immunity. 2021;54(11):2450–2464. doi:10.1016/j.immuni.2021.10.012

5. Giamarellos-Bourboulis EJ, Aschenbrenner AC, Bauer M, et al. The pathophysiology of sepsis and precision-medicine-based immunotherapy. Nat Immunol. 2024;25(1):19–28. doi:10.1038/s41590-023-01660-5

6. Torres LK, Pickkers P, van der Poll T. Sepsis-induced immunosuppression. Annu Rev Physiol. 2022;84:157–181. doi:10.1146/annurev-physiol-061121-040214

7. Sidiropoulos PI, Karvounaris SA, Boumpas DT. Metabolic syndrome in rheumatic diseases: epidemiology, pathophysiology, and clinical implications. Arthritis Res Ther. 2008;10(3):207. doi:10.1186/ar2397

8. Cutolo M, Soldano S, Smith V. Pathophysiology of systemic sclerosis: current understanding and new insights. Expert Rev Clin Immunol. 2019;15(7):753–764. doi:10.1080/1744666X.2019.1614915

9. Singh JA, Cleveland JD. Hospitalized infections in lupus: a nationwide study of types of infections, time trends, health care utilization, and in-hospital mortality. Arthritis Rheumatol. 2021;73(4):617–630. doi:10.1002/art.41577

10. van Wesemael TJ, Huizinga TWJ, Toes REM, van der Woude D. From phenotype to pathophysiology-placing rheumatic diseases in an immunological perspective. Lancet Rheumatol. 2022;4(3):e166–e7. doi:10.1016/S2665-9913(21)00369-6

11. Aydin K, Turk I. The diagnostic profile and clinical course of patients with rheumatic diseases in the medical intensive care unit. Turk J Med Sci. 2023;53(5):1084–1093. doi:10.55730/1300-0144.5673

12. Li H, Pan X, Zhang S, et al. Association of autoimmune diseases with the occurrence and 28-day mortality of sepsis: an observational and Mendelian randomization study. Crit Care. 2023;27(1):476. doi:10.1186/s13054-023-04763-5

13. Jinno S, Lu N, Jafarzadeh SR, Dubreuil M. Trends in hospitalizations for serious infections in patients with rheumatoid arthritis in the US between 1993 and 2013. Arthritis Care Res. 2018;70(4):652–658. doi:10.1002/acr.23328

14. Krasselt M, Baerwald C, Petros S, Seifert O. Sepsis mortality is high in patients with connective tissue diseases admitted to the Intensive Care Unit (ICU). J Intensive Care Med. 2022;37(3):401–407. doi:10.1177/0885066621996257

15. Yang J, Chen J, Zhang M, Zhou Q, Yan B. Prognostic impacts of repeated sepsis in intensive care unit on autoimmune disease patients: a retrospective cohort study. BMC Infect Dis. 2024;24(1):197. doi:10.1186/s12879-024-09072-y

16. Bhavani SV, Semler M, Qian ET, et al. Development and validation of novel sepsis subphenotypes using trajectories of vital signs. Intensive Care Med. 2022;48(11):1582–1592. doi:10.1007/s00134-022-06890-z

17. Bhavani SV, Spicer A, Sinha P, et al. Distinct immune profiles and clinical outcomes in sepsis subphenotypes based on temperature trajectories. Intensive Care Med. 2024;50(12):2094–2104. doi:10.1007/s00134-024-07669-0

18. van Amstel RBE, Kennedy JN, Scicluna BP, et al. Uncovering heterogeneity in sepsis: a comparative analysis of subphenotypes. Intensive Care Med. 2023;49(11):1360–1369. doi:10.1007/s00134-023-07239-w

19. Goldberger AL, Amaral LA, Glass L, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215–20. doi:10.1161/01.CIR.101.23.e215

20. Rahman MS, Islam KR, Prithula J, et al. Machine learning-based prognostic model for 30-day mortality prediction in Sepsis-3. BMC Med Inform Decis Mak. 2024;24(1):249. doi:10.1186/s12911-024-02655-4

21. Liu C, Wang Y. Development and validation of a model for prediction of septic shock in neonates with sepsis. Shock. 2024;62(2):173–178. doi:10.1097/SHK.0000000000002380

22. Yuan Y, Meng Y, Li Y, et al. Development and validation of a nomogram for predicting 28-day in-hospital mortality in sepsis patients based on an optimized acute physiology and chronic health evaluation ii score. Shock. 2024;61(5):718–727. doi:10.1097/SHK.0000000000002335

23. Wang W, Liu CF. Sepsis heterogeneity. World J Pediatr. 2023;19(10):919–927. doi:10.1007/s12519-023-00689-8

24. Martin GS, Mannino DM, Moss M. The effect of age on the development and outcome of adult sepsis. Crit Care Med. 2006;34(1):15–21. doi:10.1097/01.CCM.0000194535.82812.BA

25. Yang AP, Liu J, Yue LH, Wang HQ, Yang WJ, Yang GH. Neutrophil CD64 combined with PCT, CRP and WBC improves the sensitivity for the early diagnosis of neonatal sepsis. Clin Chem Lab Med. 2016;54(2):345–351. doi:10.1515/cclm-2015-0277

26. Hou C, Qi Y, Zhang T, et al. Evaluating the obesity paradox in patients with sepsis and cancer. Int J Obes Lond. 2025;49(9):1723–1732. doi:10.1038/s41366-025-01805-6

27. Fan ZK, Yi RQ, Feng W, et al. The impact of body mass index on clinical outcomes in elderly sepsis patients: a retrospective study based on the MIMIC IV database. Aging Clin Exp Res. 2025;37(1):211. doi:10.1007/s40520-025-03115-3

28. Cui K, Teng X, Liu W, Zhao X, Xu S, Bai L. L-shaped association of body mass index with prognosis in individuals with sepsis: a multicenter cohort study. Diabetol Metab Syndr. 2025;17(1):43. doi:10.1186/s13098-025-01607-w

29. Yan J, Li S, Li S. The role of the liver in sepsis. Int Rev Immunol. 2014;33(6):498–510. doi:10.3109/08830185.2014.889129

30. Canabal JM, Kramer DJ. Management of sepsis in patients with liver failure. Curr Opin Crit Care. 2008;14(2):189–197. doi:10.1097/MCC.0b013e3282f6a435

31. Gentilello LM. Alcohol and the intensive care unit: it’s not just an antiseptic. Crit Care Med. 2007;35(2):627–628. doi:10.1097/01.CCM.0000254070.23023.34

32. Guan C, Gong A, Zhao Y, et al. Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: a multi-center study. Crit Care. 2024;28(1):349. doi:10.1186/s13054-024-05138-0

33. Lin C, Wang J, Cai K, et al. Elevated activated partial thromboplastin time as a predictor of 28-day mortality in sepsis-associated acute kidney injury: a retrospective cohort analysis. Int J Gen Med. 2024;17:1739–1753. doi:10.2147/IJGM.S459583

34. Wang G, He Y, Guo Q, et al. Continuous renal replacement therapy with the adsorptive oXiris filter may be associated with the lower 28-day mortality in sepsis: a systematic review and meta-analysis. Crit Care. 2023;27(1):275. doi:10.1186/s13054-023-04555-x

35. Ren Y, Zhang L, Xu F, et al. Risk factor analysis and nomogram for predicting in-hospital mortality in ICU patients with sepsis and lung infection. BMC Pulm Med. 2022;22(1):17. doi:10.1186/s12890-021-01809-8

36. Yu S, Leung S, Heo M, et al. Comparison of risk prediction scoring systems for ward patients: a retrospective nested case-control study. Crit Care. 2014;18(3):R132. doi:10.1186/cc13947

37. Fujii T, Salanti G, Belletti A, et al. Effect of adjunctive vitamin C, glucocorticoids, and vitamin B1 on longer-term mortality in adults with sepsis or septic shock: a systematic review and a component network meta-analysis. Intensive Care Med. 2022;48(1):16–24. doi:10.1007/s00134-021-06558-0

38. Park YJ, Lee MJ, Bae J, et al. Effects of glucocorticoid therapy on sepsis depend both on the dose of steroids and on the severity and phase of the animal sepsis model. Life. 2022;12(3):421. doi:10.3390/life12030421

39. Einarsdottir MJ, Ekman P, Molin M, et al. High mortality rate in oral glucocorticoid users: a population-based matched cohort study. Front Endocrinol. 2022;13:918356. doi:10.3389/fendo.2022.918356

40. Ying P, Yang C, Wu X, Cai Q, Xin W. Effect of hydrocortisone on the 28-day mortality of patients with septic acute kidney injury. Ren Fail. 2019;41(1):794–799. doi:10.1080/0886022X.2019.1658605

41. Venkatesh B, Finfer S, Cohen J, et al. Hydrocortisone compared with placebo in patients with septic shock satisfying the sepsis-3 diagnostic criteria and APROCCHSS study inclusion criteria: a post hoc analysis of the ADRENAL trial. Anesthesiology. 2019;131(6):1292–1300. doi:10.1097/ALN.0000000000002955

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.