Back to Journals » Infection and Drug Resistance » Volume 19

Explainable Machine Learning Integrating Patient and Environmental Factors for Predicting Multidrug-Resistant Organism Colonization or Infection on ICU Admission

Authors Gu G, Ji Y, Xiong X, Chen M, Pan J, Yang Y, Yang M, Wang B

Received 13 November 2025

Accepted for publication 12 May 2026

Published 3 June 2026 Volume 2026:19 581390

DOI https://doi.org/10.2147/IDR.S581390

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 7

Editor who approved publication: Dr Oliver Planz



Genying Gu,1 Yan Ji,2 Xinglin Xiong,3 Ming Chen,4 Junchen Pan,5 Yue Yang,1 Ming Yang,6 Binbin Wang5

1Department of Neurosurgical Intensive Care Unit, The Affiliated BenQ Hospital of Nanjing Medical University, Nanjing, Jiangsu, 210019, People’s Republic of China; 2School of Nursing, Nanjing Medical University, Nanjing, Jiangsu, 211166, People’s Republic of China; 3Department of Nursing, The Affiliated BenQ Hospital of Nanjing Medical University, Nanjing, Jiangsu, 210019, People’s Republic of China; 4Department of Rehabilitation Medicine, The Affiliated BenQ Hospital of Nanjing Medical University, Nanjing, Jiangsu, 210019, People’s Republic of China; 5Department of Neurosurgery, The Affiliated BenQ Hospital of Nanjing Medical University, Nanjing, Jiangsu, 210019, People’s Republic of China; 6Department of Emergency, The Affiliated BenQ Hospital of Nanjing Medical University, Nanjing, Jiangsu, 210019, People’s Republic of China

Correspondence: Yan Ji, Email [email protected] Xinglin Xiong, Email [email protected]

Objective: Multidrug-resistant organisms (MDROs) pose a serious threat to global public health, particularly in intensive care units (ICU). Few studies have employed machine learning (ML) to capture complex clinical interactions. This study aimed to develop an explainable ML model for early risk stratification of MDRO colonization or infection by integrating patient-specific clinical features with environmental exposure factors.
Methods: We analyzed the data of 420 ICU patients (210 MDRO-positive cases and 210 matched controls) admitted between January 2020 and October 2023. Predictors were selected using least absolute shrinkage and selection operator (LASSO) regression. Six ML models—Logistic Regression, Random Forest, Gradient Boosting, AdaBoost, XGBoost, and LightGBM—were developed and evaluated using internal validation on a randomly split test set. The best performing model was interpreted using SHapley Additive exPlanations (SHAP), and a web-based tool was developed for clinical applications.
Results: Five predictors were identified through LASSO regression and were independently associated with the composite endpoint in subsequent multivariable logistic regression, including residence in a long-term care facility, MDRO-positive status of the prior bed occupant, central venous catheterization, surgery prior to infection, and duration of arterial catheterization. The XGBoost model demonstrated the highest performance, with an area under the curve of 0.926 for the training set and 0.862 for the validation set. SHAP analysis improved interpretability by quantifying feature contributions and illustrating the rationale behind individual predictions. A web-based tool was developed to facilitate real-time clinical risk assessment.
Conclusion: This study demonstrates the utility of integrating environmental risk factors into a ML framework for improved MDRO prediction, resulting in a web-based tool with the potential for clinical decision support and enhancing infection control workflows.

Keywords: machine learning, ICU, MDRO, prediction model, SHapley Additive exPlanations

Introduction

Multidrug-resistant organisms (MDROs) are microorganisms—primarily bacteria—that have developed resistance to three or more classes of antimicrobial agents through various genetic mechanisms. These pathogens significantly limit the available treatment options owing to their broad-spectrum resistance, thereby increasing therapeutic challenges and patient risks.1 Every year, drug-resistant infections cause millions of deaths worldwide and have emerged as a major challenge in modern healthcare systems.2 Because of their extensive drug resistance, rapid transmission, and high pathogenicity, the World Health Organization (WHO) identified antimicrobial resistance as one of the top 10 global public health threats in 2019.3,4

Infections caused by MDROs are of particular concern in intensive care unit (ICU) patients owing to their substantial impact on treatment outcomes.5 These infections are associated with increased in-hospital mortality, higher readmission rates, prolonged hospital stays, and elevated healthcare costs.6–8 Multidrug-resistant (MDR) strains have been implicated in a wide spectrum of healthcare-associated infections, including urinary tract infections, surgical site infections, bloodstream infections, and ventilator-associated pneumonia, leading to increased morbidity, mortality, and healthcare costs.9–11 The severity, complexity, and persistence of MDROs in hospital environments, particularly their resistance to last-line antibiotics such as carbapenems and colistin, have been well documented in recent molecular surveillance studies,12 further emphasizing the critical need for early prediction and enhanced infection control strategies. Globally, several MDR strains have emerged as problematic in healthcare settings. These include carbapenem-resistant Klebsiella pneumoniae, methicillin-resistant Staphylococcus (S). aureus, MDR Acinetobacter (A). baumannii, and MDR Proteus(P). mirabilis.13–17 The pathogenicity and virulence of these organisms are often enhanced by mechanisms such as the production of siderophores and metallophores, which facilitate nutrient acquisition and survival within the host.13–15

The burden of MDROs is not only confined to the healthcare setting. A study conducted in Shenzhen, China, reported a carriage rate of 26.7% among individuals who had not been hospitalized or used antibiotics in the preceding 6 months.18 Therefore, early identification of patients colonized or infected with MDROs, along with timely implementation of effective infection control measures—particularly interrupting transmission routes—is crucial for reducing the incidence of healthcare-associated infections.19

Numerous studies have developed early prediction models to forecast MDRO infection and colonization.20,21 Common identified risk factors include infection characteristics, pathogen type, invasive procedures, medication history, serological markers, and other clinical features.22 However, the primary transmission route of MDROs is contact transmission, which is influenced not only by the clinical factors of susceptible hosts but is also closely related to their healthcare environment. Environmental contamination persists on high-touch surfaces and shared equipment, creating a prolonged risk of exposure for patients admitted in the same room. Therefore, relying solely on the clinical characteristics of susceptible individuals may not fully capture the actual risk. Reportedly, if a bed occupant has MDRO infection/colonization, it constitutes a significant exposure risk for the current patient.23 Based on this, we incorporated not only the patient’s own clinical features but also environmental exposure factors from prior bed occupants to develop a more comprehensive risk prediction model.

ICU populations pose unique modeling challenges owing to high rates of invasive device utilization, rapid changes in clinical status, and dense exposure networks among patients and healthcare workers. These complexities make ICUs an ideal setting for advanced machine learning (ML) approaches that can capture nonlinear interactions and temporal dynamics that traditional statistical methods may miss. ML has become a highly valuable approach for modeling complex clinical data to develop predictive models for disease prognosis and diagnosis. Substantial and growing evidence indicates that ML models play an important role in predicting infections caused by MDRO.1,24,25 Despite these advancements, a critical knowledge gap remains: few studies have systematically integrated environmental exposure factors, such as the MDRO status of a prior bed occupant, with patient-level data within a ML framework to predict MDRO acquisition in a high-risk ICU environment.

This study makes several key contributions: (1) integration of a novel environmental exposure variable, the MDRO status of prior bed occupants, with host-specific clinical factors within a ML framework; (2) application of SHAP (SHapley Additive exPlanations) explainability methods to provide clinically interpretable individual-level risk predictions; (3) development of a publicly accessible web-based tool to facilitate real-time clinical risk assessment; and (4) demonstration of the value of this integrated approach in high-risk ICU populations. Therefore, this study aimed to develop and validate an explainable ML model that combines patient-specific clinical characteristics with environmental exposure factors to predict the risk of MDRO colonization or infection within 48 h of ICU admission.

Methods

Study Population

This retrospective, case-control study evaluated the performance of ML models in predicting MDRO infection and colonization in ICU patients. This study was approved by the Institutional Review Board (IRB) of The Affiliated BenQ Hospital of Nanjing Medical University (No. 2025-KL032). The requirement for informed consent was waived by the same ethics committee because of the study’s retrospective design and minimal risk to the participants.

Data were collected from 1615 ICU patients admitted between January 2020 and October 2023. Among them, 210 patients (13.0%) were identified as having MDRO infection/colonization. Using a 1:1 ratio, 210 control patients without MDRO acquisition were selected from the remaining 1405 patients and matched based on sex, age, Glasgow Coma Scale (GCS) score, Acute Physiology and Chronic Health Evaluation II (APACHE II) score, quick Sequential Organ Failure Assessment (qSOFA) score, and length of ICU stay to control for these major confounders. Although this matching approach reduces selection bias, it may not eliminate all potential residual confounding factors.

Inclusion criteria were: (1) age ≥ 18 years; (2) ICU stay ≥ 48 hours; and (3) at least one microbiological culture performed during the ICU stay. The exclusion criteria were as: (1) patients with documented MDRO infection/colonization prior to ICU admission; (2) patients who tested positive for MDRO within the first 48 h of ICU admission; (3) patients with missing or incomplete essential medical records; (4) patients readmitted to the ICU during the same hospitalization; and (5) those without any microbiological culture results.

MDROs were defined in accordance with the Interim Standard Definitions of MDR, XDR, and PDR Multidrug-Resistant Bacteria-International Expert Recommendations. A case was classified as colonization if MDRO was detected ≥ 48 hours after ICU admission or transfer in the absence of clinical signs of infection. If infection symptoms were present, it was classified as a MDRO infection following the 2011 Technical Guidelines for the Prevention and Control of Multidrug-resistant Bacteria Hospital Infections. The MDROs identified in our cohort included carbapenem-resistant Enterobacteriaceae, methicillin-resistant Staphylococcus aureus, multidrug-resistant Acinetobacter baumannii, and multidrug-resistant Pseudomonas aeruginosa. Repeated isolations of the same bacterial strain from the same site in a single patient were performed.

Data Collection

The following information was collected for each patient: (1) baseline characteristics: sex, age, dates of admission and discharge, duration of ICU stay, prior place of residence, time of MDRO acquisition, previous MDRO history, and MDRO status of the previous bed occupant; (2) medical history: including chronic respiratory, cardiovascular, and cerebrovascular diseases; end-stage renal disease requiring dialysis; malignancy; and pneumonia present at ICU admission; (3) severity of illness scores: GCS, APACHE II, and qSOFA; (4) laboratory parameters within 24 hours of ICU admission: complete blood count, neutrophil-to-lymphocyte ratio (NLR), C-reactive protein (CRP), procalcitonin (PCT), lactate, albumin, glucose, and prothrombin time—using the most abnormal values recorded during this period; and (5) clinical interventions: use of mechanical ventilation, urinary catheterization, central venous and arterial catheterization, nasogastric tube insertion, bronchoscopy, surgery, antibiotic administration, and sedative use.

Statistical Analysis

All statistical analyses were performed using R (version 4.4.1) and Python (version 3.7.3). Descriptive statistics and between-group comparisons were conducted using R. Categorical variables were analyzed using the chi-square test or Fisher’s exact test, as appropriate, whereas continuous variables were compared using the independent t-test or Mann–Whitney U-test, depending on their distribution. Missing data were infrequent (<5% for laboratory parameters and <3% for clinical intervention variables), and were handled using multiple imputations by chained equations (MICE) for continuous variables, with mode imputation applied for categorical variables. MICE was performed with five iterations and five imputed datasets using predictive mean matching for continuous variables and logistic regression for binary variables. The results were pooled according to Rubin’s rules. The patients were randomly divided into training and validation sets in a 7:3 ratio using the createDataPartition function in the R caret package. The dataset exhibited a class imbalance (50% cases and 50% controls by design after matching; however, the underlying population had 13% MDRO prevalence). To address the potential model bias, we employed the synthetic minority oversampling technique (SMOTE) on the training set only, generating synthetic samples of the minority class to achieve a balanced distribution for model training. The validation set remained unchanged to preserve the real-world prevalence characteristics. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation, implemented via the glmnet package in R, to retain variables with non-zero coefficients. The ML models were developed in Python using the scikit-learn library. Six algorithms were implemented using the scikit-learn (version 1.2.2) and xgboost (version 1.7.6) libraries in Python (version 3.7.3): Logistic Regression (LR),26 Random Forest (RF),27 Gradient Boosting Classifier (GBC),28 Adaptive Boosting (AdaBoost),29 Extreme Gradient Boosting (XGBoost),30 and LightGBM (LGBM).31 Hyperparameter tuning was performed for each model through a grid search approach with 5-fold cross-validation on the training set to optimize model performance and prevent overfitting. The detailed hyperparameter grid search spaces for each of the six ML algorithms are provided in Supplementary Table S1. The final model performance was evaluated using the held-out validation set, which was not used during any phase of training or hyperparameter optimization. Model evaluation was based on the sensitivity, specificity, precision, and F1 score. To enhance the interpretability of the black-box nature of complex ensemble models, we employed SHAP values. SHAP is a post hoc explainability technique grounded in cooperative game theory that quantifies the marginal contribution of each feature to individual predictions, thereby aligning our methodology with the principles of explainable ML.

To facilitate clinical application of the final predictive model, a web-based tool was developed using the Streamlit framework (Streamlit Inc., USA). This interface enables users to input key clinical variables derived from the most influential features identified by ML algorithms, and provides the predicted probability of MDRO colonization or infection. This tool is publicly available from https://gugengying.streamlit.app.

Results

Baseline Characteristics

Among the 420 ICU patients included in the study, ages ranged from 24 to 94 years (mean 69.98 ± 15.73), with 287 males (68.3%) and 133 females (31.7%). The cohort was randomly divided into a training set (n = 294) and a validation set (n = 126) in a 7:3 ratio. No statistically significant differences in baseline characteristics or clinical data were observed between the training and validation sets (Table 1).

Table 1 Comparison of Baseline Characteristics Between Training and Validation Groups

Variable Selection for MDRO Infection/Colonization Predictor

To address the multicollinearity among the candidate predictors, we applied LASSO regression to identify variables with non-zero coefficients. The coefficient path plot is shown in Figure 1A. Using 10-fold cross-validation and the one-standard-error rule, the optimal regularization parameter was determined to be λ = 0.046861, as indicated by the vertical dashed line in Figure 1B. The following significant predictors were selected: long-term care facility residence, prior bed occupancy with MDRO infection or colonization, central venous catheterization, surgery prior to infection/colonization, GCS score, duration of arterial catheterization placement, use of mechanical ventilation, platelet count, and serum albumin levels. Subsequently, a multivariate LR incorporating these predictors was performed (Supplementary Table S2). Five independent risk factors were significantly associated with MDRO acquisition (P < 0.05; Table 2): residence in a long-term care facility, MDRO-positive status of the prior bed occupant, central venous catheterization, surgery prior to infection, and duration of arterial catheterization.

Table 2 Multivariate Logistic Regression Analysis for MDRO Acquisition in ICU Patients

Two line graphs showing LASSO coefficient paths and a 10 fold cross validation curve.

Figure 1 LASSO Regression Variable Selection Mechanism. (A) LASSO Coefficient Path Analysis for Variable Selection; (B) 10-fold cross-validation curve.

Evaluation of Predictive Performance Across Models

We assessed the predictive performance of six ML models for stratifying MDRO risk, with all performance metrics calculated based on optimal cutoff points derived from ROC analysis in the training set. In the training cohort (n = 294), discriminatory ability varied among the models (Table 3 and Figure 2A). XGBoost and LightGBM achieved the highest AUC values (0.926 and 0.922, respectively). XGBoost also attained the highest F1 score (0.871; 95% confidence interval [CI]: 0.833–0.910), reflecting an optimal balance between precision and recall. In the validation cohort (n = 126), XGBoost maintained a robust performance, with an AUC of 0.862 (95% CI: 0.821–0.903) and an F1 score of 0.734 (95% CI: 0.691–0.777) (Table 3 and Figure 2B).

Table 3 Evaluation of Model Performance in the Training Set and Validation Set

A multi-line graph showing training and validation receiver operating characteristic curves for six models.

Figure 2 Comparison of ROC curves from six machine learning models for predicting MDRO infection/colonization. (A) ROC curve for the training set; (B) ROC curve for the validation set.

Abbreviations: rf, random forest; lr, logistic regression; gbc, Gradient Boosting; abc, Adaptive Boosting; xgb, eXtreme Gradient Boosting; lgb, Light Gradient Boosting Machine.

SHAP-Based Interpretation of Model Predictions

To improve the interpretability of the XGBoost model, we performed a SHAP analysis. The feature importance ranking (Figure 3A) revealed that the duration of arterial catheterization, residence in a long-term care facility, and central venous catheterization were the three most influential factors in predicting MDRO risk. The SHAP summary plot (Figure 3B) further depicts the direction and magnitude of the effect of each feature on the model predictions: higher SHAP values correspond to an increased predicted risk of MDRO infection/colonization. Specifically, a longer arterial catheterization duration was consistently associated with elevated risk predictions. This finding highlights a potential area for clinical intervention, suggesting that minimizing the duration of the use of invasive devices, where clinically feasible, could be a key strategy for reducing the risk of MDRO.

A mixed chart showing SHAP feature importance bars and a SHAP summary plot for an XGBoost model.

Figure 3 SHAP analysis of the XGBoost model. (A) Feature importance ranking based on mean absolute SHAP values. (B) SHAP summary plot showing the impact of each feature on model output.

At the individual level, SHAP force plots were used to visualize the contribution of each feature to the specific predictions (Figure 4). For instance, in the case of a true-positive patient (Patient A), the model’s SHAP output value (f(x)) was 1.4, which exceeded the baseline value and led to a correct positive prediction (Figure 4A). Conversely, a true-negative patient (Patient B) exhibited a f(x) value of –0.64, which was below the baseline value, owing to the absence of major risk factors, resulting in a correct negative classification (Figure 4B). These visualizations illustrate how the model synthesizes multiple clinical variables to generate individualized risk assessments.

Two SHAP force plots for Patient A and Patient B showing features raising or lowering the XGBoost output.

Figure 4 SHAP force plots for individual predictions from the XGBoost model. (A) Patient A (true positive) with a predicted high risk of MDRO infection/colonization (f(x) = 1.40). (B) Patient B (true negative) with a predicted low risk of MDRO infection/colonization (f(x) = –0.64).

Development of a Web-Based Prediction Tool for MDRO Colonization/Infection

To facilitate the clinical use of this predictive model, we developed a web-based tool designed to estimate the risk of MDRO colonization/infection in ICU patients. This tool, which is publicly accessible at the https://gugengying.streamlit.app, allows healthcare providers to input key clinical variables and perform a probability-based risk assessment. A screenshot of the interface is shown in Figure 5.

MDRO tool screenshot: ICU patient inputs, clinical data fields, infection risk button.

Figure 5 Screenshot of the MDRO colonization/infection prediction tool interface. The tool allows clinicians to input key variables—such as long-term care facility residency, prior bed exposure, presence of central venous catheter, recent surgery, and arterial tube days—to calculate the predicted risk of MDRO colonization or infection.

Discussion

This retrospective, case-control study integrated ICU patients’ intrinsic clinical characteristics with environmental factors—notably MDRO exposure history from the previous bed occupant—to develop a ML-based prediction model for MDRO acquisition. Among the candidate predictors selected via LASSO regression, MDRO infection/colonization in prior bed occupants was confirmed to be an independent risk factor, underscoring the critical role of contact transmission in the spread of MDROs within ICUs. After evaluating the six ML models, XGBoost demonstrated the highest predictive performance. Based on this model, we developed an online risk calculator to support early prediction and intervention, thereby enhancing clinical applicability through a user-friendly web-based tool.

LASSO regression identified nine predictors associated with MDRO infection/colonization, five of which were further established as independent risk factors through multivariate LR: residence in a long-term care facility, MDRO-positive status of the previous bed occupant, central venous catheterization, history of surgery before infection, and duration of arterial catheterization. These findings align with existing evidence and offer novel insights. Invasive procedures such as central venous and arterial catheterization are well-documented risk factors for MDRO infection as they compromise the skin and mucosal barriers and facilitate biofilm formation, thus promoting MDRO colonization and proliferation.32,33 Similarly, residence in long-term care facilities is a recognized risk factor owing to the high prevalence of MDROs in these settings and the immunocompromised status of residents, thereby establishing such facilities as potential reservoirs for MDRO transmission.34

A key innovation of this study is the incorporation of the “MDRO status of the previous bed occupant” as an environmental exposure variable within a ML prediction model, confirming its role as an independent risk factor. This result supports previous epidemiological studies indicating that contact transmission is the primary route of MDRO spread, and that prior bed occupancy by a MDRO-positive patient significantly increases the exposure risk for subsequent patients.35 While traditional prediction models have predominantly focused on patient-specific clinical factors such as comorbidities and antibiotic use,20,21 our approach integrates environmental exposure, offering a more holistic “host-environment” perspective on MDRO transmission and improving relevance to real-world clinical practice.

Among the six ML models evaluated, XGBoost achieved the highest predictive performance and demonstrated consistent results across demographic and clinical severity subgroups, significantly outperforming traditional methods, such as LR. This advantage stems from the capability of XGBoost to capture complex nonlinear relationships and variable interactions through its gradient boosting framework without relying on strict distributional assumptions, making it particularly suitable for high-dimensional multifactorial ICU data.36 While direct comparisons with prior models are challenging owing to differences in study populations and outcome definitions, the XGBoost model’s validation AUC of 0.862 compares favorably with recently published MDRO prediction models in ICU settings, which have reported AUCs ranging from 0.79 to 0.83.1,37 More importantly, our model’s inclusion of prior bed occupant MDRO status as a predictor represents a novel contribution beyond traditional models that focus exclusively on patient-level factors, capturing an important dimension of transmission dynamics that has been underexplored in ML-based risk prediction. This suggests that the incorporation of environmental exposure factors may contribute to an improved discriminatory performance. The SHAP analysis was employed to enhance the interpretability of the model. This approach identified the duration of arterial catheterization, residence in a long-term care facility, and central venous catheterization as the three most influential predictors. Longer arterial catheterization duration was associated with higher SHAP values, indicating an increased predicted risk of MDRO acquisition. Although our primary XGBoost model functions as a complex ensemble, its alignment with the principles of explainable ML was achieved through the application of SHAP. This post hoc explainability framework allowed us to move beyond simple feature importance rankings to quantify the direction and magnitude of each variable’s impact on individual risk predictions, transforming the “black box” output into clinically interpretable insights.

The findings of this study should be viewed in the broader context of early risk stratification in critically ill patients. Foundational work has demonstrated the value of simple, readily available bedside physiological indices, such as the Shock Index and its derivatives, for rapidly identifying patients at an increased risk of adverse outcomes, including those with sepsis.38 These tools offer advantages of simplicity and immediacy. In contrast, the ML model presented herein, which requires more data input, offers a more comprehensive and individualized risk assessment by integrating a wider array of clinical and environmental variables, including novel predictors of prior bed occupant MDRO status. As such, our model should be seen not as a replacement for but as a complementary and more sophisticated step in a multi-tiered approach to risk stratification, potentially triggered after initial screening with simpler indices to provide a more detailed risk profile for targeted interventions.

The integration of this model into clinical practice could follow a multitiered approach for risk stratification. Upon ICU admission, the patient’s data can be input into a web-based tool, generating a real-time MDRO risk probability. For high-risk patients, this could trigger a bundle of enhanced infection control measures, such as preemptive contact precautions, prioritized screening, and increased environmental cleaning. This tool is not intended to replace clinical judgment but to serve as a decision-support system, enabling a more targeted and efficient allocation of infection prevention resources. Furthermore, it can serve as an educational tool for healthcare workers, reinforcing the importance of key risk factors such as the duration of invasive devices and the environmental reservoir of MDROs.

Beyond the development of predictive models, the fight against MDROs requires a multifaceted approach that includes educating healthcare workers and the community. Studies have consistently highlighted gaps in knowledge, attitudes, and practices (KAP) regarding MDROs among healthcare workers, which can directly affect infection control compliance.39,40 Similarly, community-based factors such as caretaker KAP have been linked to the carriage of resistant organisms, such as ESBL-producing Escherichia coli in children.41 These findings underscore the critical importance of ongoing awareness campaigns and educational interventions targeting both healthcare professionals and the public to promote rational antibiotic use, improve hand hygiene, and strengthen adherence to infection prevention measures, thereby complementing risk stratification tools, such as those presented in this study.

Concurrently, the scientific community is actively pursuing innovative therapeutic strategies to circumvent the existing resistance mechanisms. One promising approach is the “Trojan Horse” strategy, where antibiotics are conjugated to siderophores (iron-chelating molecules) to exploit bacterial iron uptake pathways and achieve active transport into gram-negative bacteria, effectively bypassing outer membrane permeability barriers.42 Another area of active investigation involves targeting bacterial virulence factors, such as metallophores—metal-scavenging molecules essential for pathogen survival and virulence in hosts like S. aureus, K. pneumoniae, and P. mirabilis.15,16,43 By disarming the bacteria rather than killing them directly, such strategies may exert less selective pressure on the development of resistance. These novel therapeutic avenues, along with predictive modeling and stewardship efforts, represent critical components of a comprehensive global strategy for combating antimicrobial resistance.

This study had some limitations. First, the incidence of MDRO infection or colonization in our ICU was 13%, resulting in a class imbalance between the MDRO-positive and MDRO-negative groups, which may have affected model training and performance. Second, the high heterogeneity among ICU patients and the single-center retrospective study design may have introduced unmeasured confounding factors. Although case-control matching was used to minimize bias, this approach may have introduced selection bias. Third, data on the types and duration of antibiotic use, which are factors strongly associated with MDRO resistance in previous studies, were not systematically collected. Fourth, the sample size used for the model development (n=294) was relatively modest. Although XGBoost’s regularization and ensemble architecture are designed to mitigate overfitting in such contexts, the sample size may still limit the complexity of the interactions that the model can reliably learn and may affect the stability of feature importance estimates. Fifth, owing to the relatively modest sample size of the training cohort, formal subgroup analyses stratified by demographic or clinical severity categories were not performed to avoid underpowered comparisons and unstable estimates. Sixth, while SHAP improved interpretability, it did not fully quantify the interaction effects between the variables. Finally, the model was internally validated using a single-center cohort. Although internal validation provides an initial assessment of the model performance, it does not guarantee generalizability to other patient populations, healthcare settings, or geographical regions with different MDRO epidemiologies and clinical practices.

Conclusion

This study demonstrates that a XGBoost model incorporating both patients’ intrinsic clinical characteristics and the MDRO exposure history of the previous bed occupant can effectively predict the risk of MDRO infection/colonization in ICU patients. Confirmation of the prior bed occupant status as a key predictor underscores the importance of environmental exposure in MDRO transmission dynamics and supports the integration of such factors into risk assessment protocols. The strong performance of the model, coupled with SHAP-based interpretability, offers a clinically relevant tool for identifying high-risk patients who may benefit from enhanced infection control measures. While a web-based application was developed to facilitate potential clinical translation, the primary contribution of this study lies in demonstrating the value of an integrated framework for MDRO risk prediction and highlighting the potential of explainable ML to support infection prevention and control strategies.

Future research should prioritize prospective validation of this model across diverse multicenter ICU populations to evaluate its transportability, stability, and real-world clinical utility. Such studies should assess the impact of the model on clinical workflow, cost-effectiveness, and acceptance by healthcare providers. Additionally, future iterations of the model could benefit from incorporating dynamic data, such as daily antibiotic exposure and changes in clinical status, to provide a continuously updated risk profile.

Integrating this framework with molecular surveillance data on resistance mechanisms and ongoing antimicrobial stewardship programs is a promising direction for a comprehensive approach to combat MDROs.

Data Sharing Statement

Data, code, and scripts to reproduce the results or replicate the procedures of this study are available from the corresponding authors (Yan Ji and Xinglin Xiong) upon reasonable request.

Ethics Approval and Consent to Participate

Approval was obtained from the ethics committee of the Affiliated BenQ Hospital of Nanjing Medical University (No. 2025-KL032). All research was performed in accordance with the relevant guidelines and regulations, including the Declaration of Helsinki. Informed consent was waived by the same ethics committee due to the study’s retrospective design and minimal risk to participants.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This work was supported by the Jiangsu Provincial Hospital Association and Jiangsu Modern Hospital Management Research Center’s 2023 Hospital Management Innovation Research Project (Grant No. JSYGY-3-2023-403).

Disclosure

The authors declare that they have no competing interests.

References

1. Zhao W, Sun P, Li W, Shang L. Machine learning-based prediction model for multidrug-resistant organisms infections: performance evaluation and interpretability analysis. Infect Drug Resist. 2025;18:2255–14. doi:10.2147/IDR.S459830

2. Murray CJL, Ikuta KS, Sharara F; Antimicrobial Resistance Collaborators. Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis. Lancet. 2022;399(10325):629–655. doi:10.1016/S0140-6736(21)02724-0

3. World Health Organization. Ten threats to global health in 2019. Available from: https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019. Accessed August 26, 2025.

4. Li XJ, Liu Y, Du L, Kang Y. The effect of antibiotic-cycling strategy on antibiotic-resistant bacterial infections or colonization in intensive care units: a systematic review and meta-analysis. Worldviews Evid Based Nurs. 2020;17(4):319–328. doi:10.1111/wvn.12454

5. Li ZJ, Wang KW, Liu B, et al. The distribution and source of MRDOs infection: a retrospective study in 8 ICUs, 2013-2019. Infect Drug Resist. 2021;14:4983–4991. doi:10.2147/IDR.S332196

6. Oliveira ABS, Sacillotto GH, Neves MFB, et al. Prevalence, outcomes, and predictors of multidrug-resistant nosocomial lower respiratory tract infections among patients in an ICU. J Bras Pneumol. 2023;49(1):e20220235. doi:10.36416/1806-3756/e20220235

7. Tsuzuki S, Yu J, Matsunaga N, Ohmagari N. Length of stay, hospitalisation costs and in-hospital mortality of methicillin-susceptible and methicillin-resistant Staphylococcus aureus bacteremia in Japan. Public Health. 2021;198:292–296. doi:10.1016/j.puhe.2021.07.046

8. Nelson RE, Hyun D, Jezek A, Samore MH. Mortality, length of stay, and healthcare costs associated with multidrug-resistant bacterial infections among elderly hospitalized patients in the United States. Clin Infect Dis. 2022;74(6):1070–1080. doi:10.1093/cid/ciab696

9. Sokhn ES, Salami A, El Roz A, Salloum L, Bahmad HF, Ghssein G. Antimicrobial susceptibilities and laboratory profiles of Escherichia coli, Klebsiella pneumoniae, and proteus mirabilis isolates as agents of urinary tract infection in lebanon: paving the way for better diagnostics. Med Sci. 2020;8(3):32. doi:10.3390/medsci8030032

10. Hassan S, Ghssein G, Kassem Z, Alarab S, El Aris J, Ezzeddine Z. Antimicrobial resistance patterns of Escherichia coli isolates from female urinary tract infection patients in Lebanon: an age-specific analysis. Microbiol Res. 2025;16:240. doi:10.3390/microbiolres16110240

11. Kawtharani I, Ghssein G, Srour O, Chaaban AA, Salameh P. Molecular epidemiology of different bacterial pathogens and their antimicrobial resistance genes among patients suffering from surgical site infections in Lebanon. Microbiol Res. 2025;16:216. doi:10.3390/microbiolres16100216

12. Abbas DA, Al-Ouqaili MTS, Alfeehan MJ. Molecular and bacteriological characterization of colistin and carbapenem-resistant nosocomial isolates of Acinetobacter baumannii isolated from different Iraqi hospitals. J Infect Public Health. 2026;19(3):103118. doi:10.1016/j.jiph.2025.103118

13. Abbas R, Chakkour M, Zein El Dine H, et al. General overview of Klebsiella pneumonia: epidemiology and the role of siderophores in its pathogenicity. Biology. 2024;13(2):78. doi:10.3390/biology13020078

14. Chakkour M, Hammoud Z, Farhat S, El Roz A, Ezzeddine Z, Ghssein G. Overview of Proteus mirabilis pathogenicity and virulence. Insights into the role of metals. Front Microbiol. 2024;15:1383618. doi:10.3389/fmicb.2024.1383618

15. Ghssein G, Ezzeddine Z. The key element role of metallophores in the pathogenicity and virulence of Staphylococcus aureus: a review. Biology. 2022;11(10):1525. doi:10.3390/biology11101525

16. Yehya A, Ezzeddine Z, Chakkour M, et al. The intricacies of Acinetobacter baumannii: a multifaceted comprehensive review of a multidrug-resistant pathogen and its clinical significance and implications. Front Microbiol. 2025;16:1565965. doi:10.3389/fmicb.2025.1565965

17. Al-Ouqaili MTS, Hussein RA, Kanaan BA, Al-Neda ATS. Investigation of carbapenemase-encoding genes in Burkholderia cepacia and Aeromonas sobria isolates from nosocomial infections in Iraqi patients. PLoS One. 2025;20(8):e0315490. doi:10.1371/journal.pone.0315490

18. Liu D, Li G, Hong Z, et al. Prevalence of multidrug-resistant organisms in healthy adults in Shenzhen, China. Health Secur. 2023;21(2):122–129. doi:10.1089/hs.2022.0111

19. Mutters NT, Günther F, Frank U, Mischnik A. Costs and possible benefits of a two-tier infection control management strategy consisting of active screening for multidrug-resistant organisms and tailored control measures. J Hosp Infect. 2016;93(2):191–196. doi:10.1016/j.jhin.2016.02.013

20. Tseng WP, Chen YC, Yang BJ, et al. Predicting multidrug-resistant gram-negative bacterial colonization and associated infection on hospital admission. Infect Control Hosp Epidemiol. 2017;38(10):1216–1225. doi:10.1017/ice.2017.178

21. Cano A, Gutiérrez-Gutiérrez B, Machuca I, et al. Risks of infection and mortality among patients colonized with Klebsiella pneumoniae carbapenemase-producing K. pneumoniae: validation of scores and proposal for management. Clin Infect Dis. 2018;66(8):1204–1210. doi:10.1093/cid/cix991

22. Attaway HH 3rd, Fairey S, Steed LL, Salgado CD, Michels HT, Schmidt MG. Intrinsic bacterial burden associated with intensive care unit hospital beds: effects of disinfection on population recovery and mitigation of potential infection risk. Am J Infect Control. 2012;40(10):907–912. doi:10.1016/j.ajic.2011.11.019

23. Datta R, Platt R, Yokoe DS, Huang SS. Environmental cleaning intervention and risk of acquiring multidrug-resistant organisms from prior room occupants. Arch Intern Med. 2011;171(6):491–494. doi:10.1001/archinternmed.2011.64

24. Cui Z, Dong Y, Yang H, et al. Machine learning prediction models for multidrug-resistant organism infections in ICU ventilator-associated pneumonia patients: analysis using the MIMIC-IV database. Comput Biol Med. 2025;190:110028. doi:10.1016/j.compbiomed.2025.110028

25. Yang L, Lu G, Diao H, et al. Predicting infections with multidrug-resistant organisms (MDROs) in neurocritical care patients with hospital-acquired pneumonia (HAP): development of a novel multivariate prediction model. Microbiol Spectr. 2025;13(6):e0246024. doi:10.1128/spectrum.02460-24

26. Thabtah F, Abdelhamid N, Peebles D. A machine learning autism classification based on logistic regression analysis. Health Inf Sci Syst. 2019;7(1):12. doi:10.1007/s13755-019-0073-5

27. Melo CFOR, Navarro LC, de Oliveira DN, et al. A machine learning application based in random forest for integrating mass spectrometry-based metabolomic data: a simple screening method for patients with Zika Virus. Front Bioeng Biotechnol. 2018;6:31. doi:10.3389/fbioe.2018.00031

28. Liu Y, Jin S, Song L, Han Y, Yu B. Prediction of protein ubiquitination sites via multi-view features based on eXtreme gradient boosting classifier. J Mol Graph Model. 2021;107:107962. doi:10.1016/j.jmgm.2021.107962

29. Saravanan V, Lakshmi PT. SCLAP: an adaptive boosting method for predicting subchloroplast localization of plant proteins. OMICS. 2013;17(2):106–115. doi:10.1089/omi.2012.0070

30. Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM. Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem Inf Model. 2016;56(12):2353–2360. doi:10.1021/acs.jcim.6b00591

31. Zhang J, Mucs D, Norinder U, Svensson F. LightGBM: an effective and scalable algorithm for prediction of chemical toxicity-application to the Tox21 and mutagenicity data sets. J Chem Inf Model. 2019;59(10):4150–4158. doi:10.1021/acs.jcim.9b00633

32. Li J, Zheng Y, Ma J, et al. The relationship between catheter-related bloodstream infection and multi-drug resistant bacteria: a five-year retrospective study. BMC Infect Dis. 2025;25(1):988. doi:10.1186/s12879-025-11367-7

33. Li C, He L, Xu J, et al. Analysis of risk factors for Multidrug-Resistant Organism (MDRO) infections and construction of a risk prediction model in a cancer specialty hospital. Br J Hosp Med. 2024;85(10):1–11. doi:10.12968/hmed.2024.0353

34. Farid A, Han W, Kwan JKC, Yeung KL. Enhancing bedding hygiene in long-term care facilities: investigating the impact of multilevel antimicrobial polymers (MAP-1) on bacterial and MDRO reduction. Antimicrob Resist Infect Control. 2025;14(1):36. doi:10.1186/s13756-025-01555-0

35. Gu GY, Chen M, Pan JC, Xiong XL. Risk of multi-drug-resistant organism acquisition from prior bed occupants in the intensive care unit: a meta-analysis. J Hosp Infect. 2023;139:44–55. doi:10.1016/j.jhin.2023.06.020

36. Wang B, Zhang S, Meng L, Feng J. Development and validation of a nomogram model for predicting MDRO infections in elderly ICU patients with pulmonary infections. Aging Clin Exp Res. 2025;37(1):218. doi:10.1007/s40520-025-03136-y

37. Li Y, Cao Y, Wang M, et al. Development and validation of machine learning models to predict MDRO colonization or infection on ICU admission by using electronic health record data. Antimicrob Resist Infect Control. 2024;13(1):74. doi:10.1186/s13756-024-01428-y

38. Uluç K, Akkütük Öngel E, Köylü Ilkaya N, Devran Ö, Çolakoğlu ŞM, Kutbay Özçelik H. Analysis of 332 fiberoptic bronchoscopies performed in a respiratory intensive care unit: a retrospective study. Eur Rev Med Pharmacol Sci. 2024;28(4):1433–1438. doi:10.26355/eurrev_202402_35465

39. Abbas R, Salami A, Ghssein G. Knowledge, attitude, and practices of healthcare workers towards tuberculosis, multidrug-resistant tuberculosis, and extensively drug-resistant tuberculosis. Acta Microbiol Hell. 2025;70:12. doi:10.3390/amh70020012

40. Routray A, Mane A. Knowledge, Attitude, and Practice (KAP) survey on the management of multidrug-resistant gram-negative infections with innovative antibiotics: focus on ceftazidime-avibactam. Cureus. 2023;15(5):e39245. doi:10.7759/cureus.39245

41. Marusinec R, Kurowski KM, Amato HK, et al. Caretaker knowledge, attitudes, and practices (KAP) and carriage of extended-spectrum beta-lactamase-producing E. coli (ESBL-EC) in children in Quito, Ecuador. Antimicrob Resist Infect Control. 2021;10(1):2. doi:10.1186/s13756-020-00867-7

42. Ezzeddine Z, Ghssein G. Towards new antibiotics classes targeting bacterial metallophores. Microb Pathog. 2023;182:106221. doi:10.1016/j.micpath.2023.106221

43. El Roz A, Chaaban T, Issa H, Ibrahim JN, Ezzeddine Z, Ghssein G. Assessment of Methicillin-Resistant Staphylococcus aureus (MRSA) knowledge and awareness among healthcare workers in South-Lebanon. Infect Prev Pract. 2025;7(2):100451. doi:10.1016/j.infpip.2025.100451

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.