Back to Journals » Medical Devices: Evidence and Research » Volume 17

Artificial Intelligence in Emergency Trauma Care: A Preliminary Scoping Review

Authors Ventura CAI , Denton EE, David JA

Received 4 March 2024

Accepted for publication 17 May 2024

Published 23 May 2024 Volume 2024:17 Pages 191—211


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Scott Fraser

Christian Angelo I Ventura,1 Edward E Denton,2 Jessica A David3

1Department of Health, Behavior and Society, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA; Department of Allied Health, Baltimore City Community College, Baltimore, MD, USA; 2Department of Emergency Medicine, University of Arkansas for Medical Sciences, Little Rock, AR USA; Fay W. Boozman College of Public Health, University of Arkansas for Medical Sciences, Little Rock, AR, USA; 3Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA

Correspondence: Christian Angelo I Ventura, Department of Health, Behavior and Society, Johns Hopkins Bloomberg School of Public Health Baltimore, Baltimore, MD, USA, Tel +1 732 372-2141, Email [email protected]; [email protected]

Abstract: This study aimed to analyze the use of generative artificial intelligence in the emergency trauma care setting through a brief scoping review of literature published between 2014 and 2024. An exploration of the NCBI repository was performed using a search string of selected keywords that returned N=87 results; articles that met the inclusion criteria (n=28) were reviewed and analyzed. Heterogeneity sources were explored and identified by a significance threshold of P < 0.10 or an I2 value exceeding 50%. If applicable, articles were categorized within three primary domains: triage, diagnostics, or treatment. Findings suggest that CNNs demonstrate strong diagnostic performance for diverse traumatic injuries, but generalized integration requires expanded prospective multi-center validation. Injury scoring models currently experience calibration gaps in mortality quantification and lesion localization that can undermine clinical utility by permitting false negatives. Triage predictive models now confront transparency, explainability, and healthcare ecosystem integration barriers limiting real-world translation. The most significant literature gap centers on treatment-oriented generative AI applications that provide real-time guidance for urgent trauma interventions rather than just analytical support.

Keywords: artificial intelligence, machine-learning, emergency medicine, traumatology


Generative artificial intelligence (AI) refers to a rapidly advancing set of machine learning techniques that can synthesize realistic artifacts such as images, text, audio, and video when supplied with basic contextual inputs.1 Unlike discriminative models that classify inputs into existing categories, generative models create novel outputs based on patterns learned from training data. Popular examples include generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models.2 These technologies have demonstrated the ability to generate highly realistic synthetic data across various modalities. At this juncture, applications within time-sensitive and high-stakes environments such as emergency trauma care remain largely conceptual.

Trauma care epitomizes the need for quick yet accurate diagnostics, triage decisions, and clinical interventions where small errors can have catastrophic consequences. Generative AI offers untapped potential to enhance human capabilities in each of these domains. However, rigorous validation is required before generative AI can be safely entrusted to inform such high-stakes trauma decisions. There remain open questions about the clinical validity, safety, explainability, and real-world viability of these technologies.2 Further research into protocols for testing generative AI validity, safety compliance, transparency measures, and clinician interface designs could help unlock trauma-care-focused applications. Striking the right human–AI balance could enable better trauma resource allocation, treatment, and reduce morbidity and mortality for patients with traumatic injuries. This brief scoping review aims to map a preliminary landscape of the current research frontiers and barriers to practical usage of generative AI in emergency trauma diagnostics, triage, and care, leveraging robust methodology akin to conventional systematic review recommendations.


Search Strategy

On 26 February 2024, a PubMed National Center for Biotechnology Information (NCBI) repository search was conducted for articles published between 2014 and 2024 using the following search string: (“Generative Artificial Intelligence” [Title/Abstract] OR “Generative AI” [Title/Abstract] OR “Deep Learning” [Title/Abstract] OR “Neural Networks” [Title/Abstract]) AND (“Emergency Medicine” [MeSH Terms] OR “Emergency Care” [Title/Abstract] OR “Trauma Care” [Title/Abstract] OR “Emergency Department” [Title/Abstract]) AND (“Use” [Title/Abstract] OR “Application” [Title/Abstract] OR “Implications” [Title/Abstract]). The returned English results were uploaded to for comprehensive abstract and full-text review.3

Data Extraction

Duplicate results were assessed for and removed. Editorials, commentaries, and non-peer reviewed manuscripts were excluded. Two investigators independently reviewed abstracts to identify articles eligible for full-text review. The investigators then independently reviewed full-text articles to identify studies that met the PICOS-guided inclusion criteria.4 Included studies focused specifically on applications of generative artificial intelligence (AI) within emergency trauma care contexts, whether examining clinical outcomes, workflow efficiency gains, or decision-making improvements. Only original, English-language, peer-reviewed articles within the last decade were incorporated. Studies were excluded if there were concerns regarding methodical quality or integrity of the data as per the discretion of the two investigators and a third consultant. SIGN appraisal tools were used to exclude retrospective and cohort-based studies that did not meet an acceptable level of evidence for inclusion in the review.5 Conflicts were resolved through discussion and by mediation from a third consultant when necessary. Study methods were consistent with PRISMA recommendations, and although PROSPERO registration was not sought, the work remained faithful to conventional review standards.6


Statistical analysis was conducted using Stata/BE software, focusing on the aggregate prevalence of outcomes.7 Studies that did not report sufficient statistical information did not undergo quantitative analysis. Heterogeneity among studies was assessed through the I2 statistic and Chi-squared test. Significant heterogeneity was defined as a p-value of less than 0.10 or an I2 value exceeding 50%, in accordance with the guidelines proposed in the Cochrane Handbook for Systematic Reviews of Interventions.8 In instances of significant heterogeneity, a fixed effects model was utilized for data analysis. Conversely, in the absence of significant heterogeneity, a random effects model was adopted to accommodate the variability among the studies included. To elucidate potential sources of heterogeneity, studies were categorized into three distinct domains based on their primary focus: triage, diagnostics, or treatment when applicable.

Ethical Considerations

Because the work did not involve the use of human research subjects, it did not require approval or review by an institutional review board or bioethics committee.


The NCBI repository returned N=87 results, with n=28 articles utilized for analysis and inclusion in this study. Results were excluded if they did not satisfy the inclusion criteria. Figure 1 depicts an overview of the exclusion schema. AI identified n=6 duplicative results, n=45 results were excluded due to irrelevance with respect to the area of investigation, and n=2 studies were excluded after investigators performed full-text reviews and found studies to be of unsatisfactory evidence levels in accordance with SIGN appraisal guidelines. Table 1 depicts the characteristics of studies selected for inclusion.

Table 1 Selected Characteristics of Studies Identified for Inclusion

Figure 1 Study selection flow chart and overview of exclusion schema.

Of the included articles, the majority were US-based (n=8) and retrospective cohort studies (n=8). Primary taxonomy schema revealed the following prevalence of data: n=21 diagnostics, n=1 treatment, n=6 triage. Deep learning demonstrates high diagnostic accuracy for various traumatic injuries identifiable on medical imaging. Convolutional neural networks attained over 90% sensitivity and specificity for detecting solid organ abdominal trauma like spleen, liver, and kidney lesions on CT scans.9 Additional models achieved up to 97% accuracy in diagnosing distal radius fractures on radiographs,11 98% sensitivity for hip fractures on pelvic X-rays,14 and AUC exceeding 0.80 for intracranial hemorrhage detection on head CT scans.19 Deep learning also shows precision in localizing traumatic findings, with activation mapping techniques precisely pinpointing 95.9% of hip fracture lesions14 and models consistently highlighting displaced ribs on chest CTs.32

Beyond binary classification, deep learning shows an aptitude for real-time injury severity quantification to guide downstream care decisions. One model leveraging trusted outcomes of length of stay, discharge disposition, and mortality for 176 potential injuries demonstrated comparable or superior performance to the Injury Severity Score.13 However, the model requires additional assessment to address mortality overestimation. Workflow and efficiency dividends represent another promising application area. Algorithmic fracture nominations focus radiologists’ attention to suspicious areas, improving detection rates by 65.7% and reducing reading times.21 Minimizing false positives and negatives remains an open challenge.19,32

Cardiac and respiratory deterioration prediction represents an emerging area harnessing deep learning’s pattern recognition capabilities. An algorithm analyzed echocardiogram videos for abnormal cardiac function with AUC exceeding 0.90.18 Another model predicted cardiac arrest and respiratory failure 1–6 hours in advance using only vital signs and history, surpassing traditional early warning scores.35 Considerations span collecting diverse and standardized validation data23 to enhancing localization to avoid false negatives19 and addressing ethical concerns related to black-box recommendations and over-reliance.17 Partnership with emergency and trauma radiologists is critical for translating technical potential into clinical practice improvements.33

Studies in the triage domain demonstrated that advanced machine learning approaches, especially deep neural network architectures, attain state-of-the-art performance across important emergency department triage outcomes. For critical care prediction, neural networks achieved a median C-statistic of 0.871, significantly outperforming conventional triage methods like vital sign cutoffs (0.832) and the Emergency Severity Index (0.809) scale.30 This indicates superior discriminative ability to stratify patients likely to require critical care interventions. Deep learning methods combining convolutional and recurrent neural networks attained even higher predictive accuracy with AUCs exceeding 0.95 for outcomes like hospitalization, mortality, and ICU admission.25 The high AUC values reflect the precise delineation of patients at low versus high risk for adverse outcomes. Beyond binary classification, machine learning methods also show strong calibration for predicting length of stay with mean absolute errors averaging around only 24 to 48 hours.30 Overall, the studies validate machine learning, especially modern deep neural networks, as valuable clinical decision support tools for improved patient acuity assessment and risk segmentation early in the emergency care process.

While machine learning methods strongly outperform standard triage approaches, simpler methods like logistic regression still demonstrate utility. Despite attaining lower median predictive accuracy than neural networks, linear models provide transparency into how different clinical variables are weighted and combined for overall risk estimations.30 This interpretability promotes clinician trust and understanding of model recommendations, a key element influencing real-world adoption. Future triage systems should explore ensembling complex deep learning components with explainable modeling techniques to optimize performance and explicability.

Findings also revealed missed opportunities for leveraging diverse patient data, both structured and unstructured, to further enhance predictive insights. For instance, models utilizing only structured triage data (eg, vital signs, demographics) achieved AUCs of 0.77 and 0.70 for hospitalization and fast-track predictions, respectively. However, models also fed long-format clinical notes attained substantially higher discrimination with an AUC of 0.87 for both outcomes.15 This underscores the wealth of nuanced clinical information within free-text assessments. Structuring and embedding these heterogeneous data into neural networks could strengthen patient acuity evaluations. Few systems currently draw data directly from electronic health records, representing another untapped data source.10


Most of the investigated studies demonstrated that CNNs can achieve exceptional diagnostic accuracy, with sensitivity and specificity exceeding 90% for diverse traumatic injuries identifiable on CT, X-ray, and MRI.9,11,14,19,24 However, reliance on single-center, retrospective data risks optimism bias and overfitting, requiring further multi-institutional prospective validation encompassing diverse patient populations and scanners before responsible clinical integration.9,11,14,19,21,24 Additional gaps emerge in standardized injury quantification and localization. Injury severity scoring models demonstrate comparable performance to validated classifications like ISS but currently suffer from mortality overestimation, requiring refinements to improve calibration.13 Enhancing lesion localization also remains critical for avoiding missed diagnoses, with hybrid human-AI workflows emphasizing fracture nominations showing particular promise to improve interpreter sensitivity.19,21,32 Regarding operational efficiency, while studies hypothesize expedited interpretations and reduced reading volumes, robust quantifications through metrics like interpretation times, reporting throughput, protocol adherence rates, or workload reductions are lacking.21,33 Methodical workflow studies leveraging process mining techniques could elucidate AI’s concrete efficiency dividends.

Advanced deep learning models achieve impressive triage predictions, but considerable IT ecosystem integration barriers persist, with only 11.7% of identified clinical decision support systems interfacing with EHRs.10 Implementation research fusing predictive models with health information exchanges could strengthen risk analytics through expanded data interoperability. Another persistent gap emerges between predictive prowess and model transparency, a key element for clinical adoption. While complex neural networks boast strong performance, simpler regression models provide greater visibility into risk calculations.30 Hybrid human-AI approaches blending complex and interpretable models could balance performance and explicability. Additionally, prediction-centric studies dominate over evidence confirming meaningful care pathway improvements, with assessments of tangible patient or system-level benefits remaining sparse. Significant translational research confirming clinical decision support tools that safely utilize enhanced predictions to guide impactful protocols is urgently required. Application-focused treatment investigations represent the largest literature void, with only one study examining an AI-enabled point-of-care ultrasound tool for cardiac assessments.18 Studies evaluating real-time AI guidance for urgent trauma interventions like ventilator titrations, smart wound analytics, and specialty consultation recommendations are sparse but sorely needed. Robust treatment-oriented research assessing AI’s downstream therapy optimization potential will be critical for patient outcome improvements.


This review has several limitations worth noting. First, the literature extraction was conducted in only one database, which poses the risk of excluding relevant studies indexed elsewhere. Expanding the search across more databases could reduce this risk. Second, limiting the date range from 2014 onward may omit important prior foundational research. Third, while attempts were made to appraise study quality, meta-analyses innately rely upon the methodical rigor of the included works. Variability between study designs is a common source of heterogeneity. Fourth, manual screening and data extraction introduce the potential for human error or bias, which could be mitigated through dual independent reviewer methods for all stages. Fifth, parameters like the confidence interval and statistical tests for assessing heterogeneity and publication bias, while standard, still involve some subjectivity. Finally, the qualitative synthesis was designed to summarize key themes but was not a fully comprehensive overview of all variables and outcomes reported across heterogeneous trauma-focused generative AI research. A more exhaustive quantification of specific clinical endpoints could better direct practical applications and implementations.


While studies demonstrate that artificial intelligence and deep learning models can achieve impressive diagnostic, prognostic, and workflow efficiency capabilities within emergency trauma contexts, substantial research gaps remain before widespread clinical integration. Diagnostically, CNNs attain exceptional sensitivity and specificity for diverse injury detection, but reliance on retrospective single-center data risks optimism and overfitting biases, necessitating expanded prospective multi-institutional validation on heterogeneous scanners and populations. Injury severity scoring requires calibration refinements to address mortality overestimation, while enhanced lesion localization techniques can heighten model utility by reducing false negatives. Robust workflow studies leveraging process mining methods could better quantify efficiency gains beyond conjectured reading or interpretation time reductions. Prognostically, deep neural networks boast predictive accuracy for triage acuity metrics but confront integration, transparency, and translation obstacles limiting clinical adoption. Significant investigation confirming improved care pathways utilizing enhanced predictions is lacking.

Most prominently, the literature lacks application-oriented treatment studies evaluating real-time AI guidance for augmenting urgent trauma interventions through analytical techniques like inferential sensor fusion, physiology-based parameter customization, and multi-modal feature extraction. Limited ultrasound point-of-care analysis signifies an early use case, but substantial outcome-centric research is urgently required. In conclusion, realizing artificial intelligence’s radiant potential necessitates confronting these gaps through rigorous expansion of prospective, multicenter studies as well as an investigational emphasis on model explainability, systems integration, and therapy-centric assessments linking AI utilization to patient benefits uniquely salient within emergency trauma settings.


The work is solely that of the authors and does not necessarily represent the views, policies, or opinions of their affiliated institutions, employers, or partners. It was not reviewed or endorsed by any specific institution in particular.

Author Contributions

All authors contributed to data analysis, drafting or revising the article, have agreed on the journal to which the article will be submitted, gave final approval of the version to be published, and agree to be accountable for all aspects of the work.


The work is not funded by any specific source.


The authors report no known conflicts of interest, financial or otherwise in this work.


1. Martinelli DD. Generative machine learning for de novo drug discovery: a systematic review. Comput Biol Med. 2022;145:105403. doi:10.1016/j.compbiomed.2022.105403

2. Paladugu PS, Ong J, Nelson N, et al. Generative adversarial networks in medicine: Important considerations for this emerging innovation in artificial intelligence. Ann Biomed Eng. 2023;51(10):2130–2142. doi:10.1007/s10439-023-03304-z

3. Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A; Ouzzani et. al Rayyan — a web and mobile app for systematic reviews. Syst Rev. 2016;5(210). doi:10.1186/s13643-016-0384-4

4. Amir-Behghadami M, Janati A. Population, intervention, comparison, outcomes and Study (PICOS) design as a framework to formulate eligibility criteria in systematic reviews. Emerg Med J. 2020;37(6):387. doi:10.1136/emermed-2020-209567

5. Methodology Checklist 1: Systematic Reviews and Meta-Analyses. Scottish Intercollegiate Guidelines Network, Available from: Accessed May 18, 2024.

6. Moher D, Liberati A, Tetzlaff J, Altman, P DG. Group Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement.

7. Med P, Harris JD, Quatman CE, Manring MM, Siston RA, Flanigan DC. How to write a systematic review. Am J Sports Med. 2014;42(11):2761–2768. doi:10.1177/0363546513497567

8. Higgins JPT, Thomas J, Chandler J, et al. Cochrane Handbook for systematic reviews of interventions version 6.2 Cochrane; 2021.

9. Cheng CT, Lin HH, Hsu CP, et al. Deep Learning for automated detection and localization of traumatic abdominal solid organ injuries on CT scans. J Imaging Inform Med. 2024. doi:10.1007/s10278-024-01038-5

10. Michel J, Manns A, Boudersa S, et al. Clinical decision support system in emergency telephone triage: a scoping review of technical design, implementation and evaluation. Int J Med Inform. 2024;184:105347. doi:10.1016/j.ijmedinf.2024.105347

11. Russe MF, Rebmann P, Tran PH, et al. AI-based X-ray fracture analysis of the distal radius: accuracy between representative classification, detection and segmentation deep learning models for clinical practice. BMJ Open. 2024;14(1):e076954. doi:10.1136/bmjopen-2023-076954

12. Piliuk K, Tomforde S. Artificial intelligence in emergency medicine. A systematic literature review. Int J Med Inform. 2023;180:105274. doi:10.1016/j.ijmedinf.2023.105274

13. Choi J, Vendrow EB, Moor M, Spain DA. Development and validation of a model to quantify injury severity in real time. JAMA network open. 2023;6(10):e2336196. doi:10.1001/jamanetworkopen.2023.36196

14. Gao Y, Soh NYT, Liu N, et al. Application of a deep learning algorithm in the detection of Hip fractures. iScience. 2023;26(8):107350. doi:10.1016/j.isci.2023.107350

15. Sax DR, Warton EM, Sofrygin O, et al. Automated analysis of unstructured clinical assessments improves emergency department triage performance: a retrospective deep learning analysis. J Am Coll Emerg Physicians Open. 2023;4(4):e13003. doi:10.1002/emp2.13003

16. Ouyang CH, Chen CC, Tee YS, et al. The application of design thinking in developing a deep learning algorithm for hip fracture detection. Bioengineering. 2023;10(6):735. doi:10.3390/bioengineering10060735

17. Masoumian Hosseini M, Masoumian Hosseini ST, Qayumi K, Ahmady S, Koohestani HR. The aspects of running artificial intelligence in emergency care; a scoping review. Arch Acad Emerg Med. 2023;11(1):e38. doi:10.22037/aaem.v11i1.1974

18. He B, Dash D, Duanmu Y, Tan TX, Ouyang D, Zou J. Ai-Enabled Assessment Of Cardiac Function And Video Quality In Emergency Department Point-Of-Care Echocardiograms. J Emerg Med. 17:2023. doi:10.1016/j.jemermed.2023.02.005

19. Abrigo JM, Ko KL, Chen Q, et al. Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong. Hong Kong Med J. 2023;29(2):112–120. doi:10.12809/hkmj209053

20. Sundrani S, Chen J, Jin BT, Abad ZSH, Rajpurkar P, Kim D. Predicting patient decompensation from continuous physiologic monitoring in the emergency department. NPJ Digit Med. 2023;6(1):60. doi:10.1038/s41746-023-00803-0

21. Inoue T, Maki S, Furuya T, et al. Automated fracture screening using an object detection algorithm on whole-body trauma computed tomography. Sci Rep. 2022;12(1):16549. doi:10.1038/s41598-022-20996-w

22. Rashid T, Zia MS, Najam-Ur-Rehman M, Rauf T, Kadry S HT, Kadry S. A minority class balanced approach using the DCNN-LSTM method to detect human wrist fracture. Life. 2023;13(1):133. doi:10.3390/life13010133

23. Zech JR, Santomartino SM, Yi PH. Artificial Intelligence (AI) for fracture diagnosis: an overview of current products and considerations for clinical adoption, from the ajr special series on ai applications. AJR Am J Roentgenol. 2022;219(6):869–878. doi:10.2214/AJR.22.27873

24. Wei J, Li D, Sing DC, et al. Detecting total Hip arthroplasty dislocations using deep learning: clinical and Internet validation. Emerg Radiol. 2022;29(5):801–808. doi:10.1007/s10140-022-02060-2

25. Yao LH, Leung KC, Tsai CL, Huang CH, Fu LC. A novel deep learning-based system for triage in the emergency department using electronic medical records: retrospective cohort study. J Med Internet Res. 2021;23(12):e27008. doi:10.2196/27008

26. Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, et al. Machine learning methods applied to triage in emergency services: a systematic review. Int Emerg Nurs. 2022;60:101109. doi:10.1016/j.ienj.2021.101109

27. Dipnall JF, Page R, Du L, et al. Predicting fracture outcomes from clinical registry data using artificial intelligence supplemented models for evidence-informed treatment (PRAISE) study protocol. PLoS One. 2021;16(9):e0257361. doi:10.1371/journal.pone.0257361

28. Kim MW, Jung J, Park SJ, et al. Application of convolutional neural networks for distal radio-ulnar fracture detection on plain radiographs in the emergency room. Clin Exp Emerg Med. 2021;8(2):120–127. doi:10.15441/ceem.20.091

29. Joseph JW, Leventhal EL, Grossestreuer AV, et al. Deep-learning approaches to identify critically Ill patients at emergency department triage using limited information. J Am Coll Emerg Physicians Open. 2020;1(5):773–781. doi:10.1002/emp2.12218

30. Miles J, Turner J, Jacques R, Williams J, Mason S. Using machine-learning risk prediction models to triage the acuity of undifferentiated patients entering the emergency care system: a systematic review. Diagn Progn Res. 2020;4:16. doi:10.1186/s41512-020-00084-1

31. Ozkaya E, Topal FE, Bulut T, Gursoy M, Ozuysal M, Karakaya Z. Evaluation of an artificial intelligence system for diagnosing scaphoid fracture on direct radiography. Eur J Trauma Emerg Surg. 2022;48(1):585–592. doi:10.1007/s00068-020-01468-0

32. Weikert T, Noordtzij LA, Bremerich J, et al. Assessment of a deep learning algorithm for the detection of rib fractures on whole-body trauma computed tomography. Korean J Radiol. 2020;21(7):891–899. doi:10.3348/kjr.2019.0653

33. Jalal S, Parker W, Ferguson D, Nicolaou S. Exploring the role of artificial intelligence in an emergency and trauma radiology department. Can Assoc Radiol J. 2021;72(1):167–174. doi:10.1177/0846537120918338

34. Hwang EJ, Nam JG, Lim WH, et al. Deep Learning for Chest Radiograph Diagnosis in the Emergency Department. Radiology. 2019;293(3):573–580. doi:10.1148/radiol.2019191225


36. Landry AP, Ting WKC, Zador Z, Sadeghian A, Cusimano MD. Using artificial neural networks to identify patients with concussion and postconcussion syndrome based on antisaccades. J Neurosurg. 2018;1–8. doi:10.3171/2018.6.JNS18607

Creative Commons License © 2024 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.