Back to Journals » Advances in Medical Education and Practice » Volume 10

The association between United States Medical Licensing Examination scores and clinical performance in medical students

Authors Gauer JL , Jackson JB

Received 25 October 2018

Accepted for publication 28 February 2019

Published 26 April 2019 Volume 2019:10 Pages 209—216

DOI https://doi.org/10.2147/AMEP.S192011

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Md Anwarul Azim Majumder



Jacqueline L Gauer,1 J Brooks Jackson2

1Medical School, University of Minnesota, Minneapolis, MN, USA; 2Roy J. and Lucille A. Carver College of Medicine, University of Iowa, Iowa City, IA, USA

Purpose: United States Medical Licensing Examination (USMLE) Step 1 and Step 2 Clinical Knowledge (CK) scores are frequently used to evaluate applicants to residency programs. Recent literature questions the value of USMLE scores for evaluation of residency applicants, in part due to a lack of evidence supporting a relationship with clinical performance. This study explored the relationship between USMLE scores and medical students’ clinical performance, as measured by the count of honors grades received in core clinical clerkships.
Methods: USMLE Step 1 and Step 2 CK scores and number of honors grades per student in seven core clinical clerkships were obtained from 1,511 medical students who graduated in 2013–2017 from two medical schools. The relationships between variables were analyzed using correlation coefficients, independent-samples t-tests, and hierarchical multiple regression.
Results: Count of honors grades correlated with both Step 1 (R=0.480, P<0.001) and Step 2 CK (R=0.542, P<0.001). After correcting for gender, institution, and test-taking ability (using MCAT scores as a proxy for test-taking ability) in a hierarchical multiple regression model, Step 1 and Step 2 CK scores together explained 22.2% of the variance in count of honors grades.
Conclusion: USMLE Step 1 and Step 2 CK scores moderately correlate with the number of honors grades per student in core clinical clerkships. This relationship is maintained even after correcting for gender, institution, and test-taking ability. These results indicate that USMLE scores have a positive linear association with clinical performance as a medical student.

Keywords: United States Medical Licensing Exam, clinical performance, residency application, correlation, hierarchical multiple regression

Introduction

Successful completion of the United States Medical Licensing Examination (USMLE) is required for licensure to practice as a physician in the United States. Scores on the portions of the exam which are taken during undergraduate medical education, Step 1 and Step 2 Clinical Knowledge (CK), are increasingly used by residency programs to determine whether to offer an interview and/or for determining rank order in the National Resident Matching Program (MATCH) list for a residency position. Typically, higher USMLE scores are viewed favorably by residency program directors and increase a student’s chance of matching with a program, especially for highly competitive programs such as dermatology and orthopedics.1 With undergraduate medical schools increasingly moving to a pass/fail grading system in years 1 and 2, USMLE scores have been weighted more heavily by residency program directors.

Some have argued that such a reliance on USMLE scores is a mistake, since the exams were initially designed to simply set a threshold for the medical knowledge and skills necessary for practice, and the numeric scores were not originally intended to be used in the residency admissions process.2 Furthermore, anecdotal and experiential evidence indicates that an overreliance on USMLE scores in the residency admissions process may have inadvertently led to an unhealthy “Step 1 climate” for undergraduate medical students, decreasing overall student well-being.3,4 On the other hand, the presidents of the organizations that sponsor the USMLE have argued that residency program directors need some way to compare an increasingly large number applicants from varied institutions and contexts, and that USMLE scores are a valid source for filling that need.5 This debate is compounded by the relatively limited and inconsistent body of literature exploring whether USMLE scores predict actual clinical performance.

Some studies have found evidence for a relationship between USMLE scores and physician behavior. Cuddy and colleagues found a negative association between a physician’s USMLE Step 2 CK scores and their likelihood of receiving a disciplinary action from a US state medical board,6 following a study by Norcini and colleagues that found a negative association between USMLE Step 2 CK scores and patient mortality for physicians practicing in the US who were trained at international medical schools.7 Those studies, however, do not address clinical performance directly. They also only found evidence for the validity of Step 2 CK scores, while Step 1 scores are the one most commonly relied upon by residency program directors, as not every student will have completed Step 2 CK by the time they enter the MATCH program.

The evidence for the relationship between USMLE scores and clinical performance is less consistent. The authors of one study assert that “a passing step 1 score has little to no predictive value in terms of how good a physician a student will become.”3 Others found no evidence of an association between Step 1 scores and competency performing certain specific clinical procedures such as thoracentesis and central venous catheter insertion.8 However, those individual procedures do not constitute the whole picture of a student’s clinical ability. Studies of Canadian physicians, on the other hand, have shown that low scores on Canada’s physician licensing exam, the Medical Council of Canada Qualifying Examination, correlate with poor performance on peer assessments of a physician’s quality of care,9 as well as lower rates of mammography screening and referral to consultation, and less effective prescription practices.10,11 While these studies show promising evidence for the validity of licensing examinations as a whole, they have not evaluated the USMLE or US physicians specifically. Such studies have also focused on performance as a physician, which is different from performance as a medical student.

Few studies have explored the relationship between licensing exam scores and performance in residency specifically. There is evidence that scores on a prototype version of the USMLE Step 2 Clinical Skills (CS) exam were related to residency directors’ ratings of intern performance.12 However, the implemented version of the USMLE Step 2 CS exam is scored as pass/fail only, and therefore does not have numeric scores that can be considered during the residency application process.

The objective of this study was to add to the literature exploring the validity of licensing examination scores in the US for predicting overall clinical performance during medical training. We pooled data from 1,511 students from 5 years of graduating classes at two different medical schools to help determine whether USMLE scores have value in the residency application process.

We chose to use the number of honors grades a medical student achieves in the core clinical clerkships as our measure of clinical performance. Achieving a grade of honors in a clinical clerkship, at least in theory, should reflect excellent clinical performance on the part of the student. One argument against this approach could be that any relationship between USMLE scores and medical school grades could simply reflect an underlying confounding variable of “test-taking ability.” However, while clerkship honors grades do typically include at least a portion based on exam grades, the majority of clerkship honors grades, at least at the two medical schools under study, are based on evaluations with respect to diagnosis and treatment, clinical observation of student-patient interactions, and observation of clinical skills and professionalism. We also used hierarchical multiple regression to explore the relationship between USMLE and clerkship honors grades after correcting for “test-taking ability,” by using Medical College Admissions Test (MCAT) scores as a proxy for “test-taking ability.” We hypothesized that we would find an association between USMLE Step 1 and 2 CK scores with the number of honors grades per medical student on core clinical clerkships, both before and after correcting for “test-taking ability.”

Methods

Institutional approval

This study received a determination of Not Human Subjects Research by the Institutional Review Board at the University of Iowa on May 2, 2018. IRB ID: 201805704. This study received a determination of Not Human Subjects Research by the Institutional Review Board at the University of Minnesota on May 16, 2018. IRB ID: STUDY00003484. The IRBs waived the requirement of informed consent for this medical student record review.

Participants

Participants for this study included 1,511 medical students (814 (53.9%) men, 697 (46.1%) women) from the graduating classes of 2013–2017 from two public Midwestern medical schools, Institution A (N=790 (52.3%)) and Institution B (N=721 (47.7%)). Institution A offers students the opportunity to participate in Longitudinal Integrated Clerkships (LICs). Since rotation schedules and grading methods are significantly different in LICs than in the standard block rotation format, students from that institution who had participated in an LIC were excluded from the data. The data from Institution B were missing one USMLE Step 1 score and two Step 2 CK scores. Data from those students were not included in analyses using those exams.

Sources of data

For each student in the sample, we obtained their passing USMLE Step 1 and 2 CK scores and the total number of honors grades that student received in the seven clinical core clerkships that overlapped both institutions (Surgery, Pediatrics, Internal Medicine, Family Medicine, Neurology, Psychiatry, and Obstetrics Gynecology). The percentage of students receiving honors in each clerkship varies depending on the clerkship and institution, but typically ranges around 15–50%. The specific policy for how honors grades are determined is defined by the institution and the director of the clerkship in question, and often includes factors such as an evaluation of the student’s competency during interactions with patients, case presentations, competency in specific clinical procedures, written coursework, and scores on multiple-choice exams such as the Subject Examinations offered by the National Board of Medical Examiners. For each of the core clinical clerkships under study, the majority of the grade was determined by clinical competencies and professionalism, not by test scores. Still, the inclusion of exam scores in the determination of at least some portion of the honors grades introduces the possibility that a latent confounding variable related to “test taking ability” may drive any relationship between USMLE scores and honors grades. In order to explore this possibility, we also obtained each student’s Medical College Admission Test (MCAT) combined score (all students in this sample took the version of the MCAT that existed previous to the 2015 revision). We also obtained gender and institution data for each student. The data were provided by a data analyst accessing student records held by the Office of Medical Education at Institution A, and by the registrar at Institution B. All data were provided de-identified.

Analyses

Using Excel 2016 (Microsoft, Redmond, WA), we calculated the average Step 1 and Step 2 CK score for each count of honors grades received, and plotted the results. Using SPSS Statistics v.22 (IBM, Armonk, NY, USA), we calculated Pearson product-moment correlation coefficients for Step 1, Step 2 CK, and the count of honors grades received.

To test for differences in gender and institution, we used SPSS to perform independent-samples t-tests for Step 1, Step 2 CK, and count of honors. Since we did find differences in gender and institution, we also used hierarchical multiple regression (HMR) in SPSS to calculate a model predicting count of honors grades based on Step 1 and Step 2 CK scores, while first correcting for gender and institution. Finally, to correct for the possible confounding influence of a latent “test-taking ability” variable, we used MCAT score as a proxy for “test-taking ability” and added that to the HMR model, in order to determine the amount of variance in count of honors grades accounted for by Step 1 and Step 2 CK scores.

Results

Correlations

The mean Step 1 and Step 2 CK score per count of honors grades are shown in Figure 1. Count of clerkship honors grades correlated with both Step 1 (R=0.480, P<0.001) and Step 2 CK (R=0.542, P<0.001). Likewise, both Step 1 and Step 2 CK scores increased proportionally with number of honors grades, from a mean Step 1 score of 222 for no honors to 260 for 7 honors, and from a mean Step 2 CK score of 233 for no honors to 266 for 7 honors.

Figure 1 Mean United States Medical Licensing Examination (USMLE) Step 1 (A) and Step 2 Clinical Knowledge (CK) (B) Scores per count of honors grades received in core clinical clerkships, for medical students graduating in 2013–2017 from two US Medical Schools (N=1,511).

Group differences

In order to explore the role of possible confounding variables in the relationships between count of honors and Step 1 and Step 2 CK scores, we used independent-samples t-tests to determine whether there were gender and/or institutional differences affecting all three variables. Means and standard deviations for each variable can be found in Table 1.

Table 1 Summary of descriptive statistics for medical students graduating in 2013–2017 from two US Medical Schools (N=1,511)

Independent-samples t-tests found a significant gender difference in favor of males over females on Step 1 (t(1508)=6.64, P<0.001). However, a significant difference was not found between males and females on Step 2 CK (t(1507)=0.34, P=0.0.74). This pattern is consistent with previous research on gender differences in USMLE scores.1315 A significant difference was found in favor of females over males on count of clerkship honors grades (t(1450.25)=−4.44, P<0.001). The t-test for count of honors failed Levene’s Test for Equality of Variances so equal variances were not assumed for that variable.

We also determined, using independent-samples t-tests, that there were significant differences between the students from the two institutions on Step 1 (t(1508)=1.98, P=0.048), Step 2 CK (t(1455.35)=3.15, P=0.002), and count of honors (t(1506.80)=16.44, P<0.001).

Hierarchical multiple regression (HMR)

We found that our data met the relevant assumptions of HMR. First, we deemed 1,511 to be an adequate sample size for an analysis including five independent variables.16 As none of the independent variables (institution, gender, MCAT score, Step 1 score, Step 2 score) are a combination of other independent variables, the assumption of singularity was met. Collinearity statistics were all within acceptable limits (the lowest tolerance was 0.60 and the highest variance inflation factor was 1.68), so the assumption of multicollinearity was deemed to have been met. Extreme outliers were screened for upon initial analysis of the data.

Since we found group differences in both gender and institution, and in order to control for a possible confounding factor of “test-taking ability,” we calculated a four-stage HMR model to control for those differences when predicting count of honors grades from Step 1 and Step 2 CK. We entered into the model gender and institution at stage one, MCAT score at stage two, Step 1 score at stage three, and Step 2 CK score at stage four. The regression statistics can be found in Table 2.

Table 2 Summary of hierarchical regression analysis for variables predicting count of honors grades received in core clinical clerkships for medical students graduating in 2013–2017 from two US medical schools (N=1,511)

The HMR analysis showed that at stage one, gender and institution contributed significantly to the regression model, F (2,1507)=140.97, P<0.001, and accounted for 15.8% of the variation in count of honors grades. Introducing MCAT scores in stage two explained an additional 5.4% of the variation in Count of Honors Grades, and this change in R2 was significant, F-change (1,1506)=103.58, P<0.001. In stage three, including Step 1 scores in the model explained an additional 19.0% of the variance in count of honors grades, and this change in R2 was also significant, F-change (1,1505)=477.66, P<0.001. Finally, including Step 2 CK scores in the model in stage four explained an additional 3.2% of the variance, and this final change in R2 was also significant, F-change (1,1504)=85.73, P<0.001.

With all five independent variables included at stage four of the regression model, only MCAT score was not a significant predictor of count of honors grades. The most important predictor appeared to be USMLE Step 1 scores, which uniquely explained 19.0% of the variance in count of honors grades when entered into the regression model. USMLE Step 1 and Step 2 CK scores together explained 22.2% of the variance in count of honors grades.

In order to check the robustness of this pattern, we also analyzed the data from each institution separately. The regression statistics for these analyses can be found in Tables S1 and S2 in the supplementary data for this manuscript. When calculating the model described above for each institution separately, both USMLE Step 1 and Step 2 CK scores maintained statistically significant independent effects, and each explained more of the variance in count of honors grades than gender and MCAT score combined.

Discussion

Our data show a moderate positive linear correlation between both USMLE Step 1 and 2 CK scores with the number of honor grades per student achieved in core clinical clerkships. This relationship is apparent upon visual examination of the scatterplots in Figure 1. Part of this correlation is likely due to scores on national shelf exams comprising portions of some honors grades, but exam scores only constituted between 30 and 40% of the basis for the grade for most clerkships, and therefore cannot fully explain this correlation. Furthermore, HMR showed that USMLE scores continue to explain some of the variance in clerkship honors grades, even after correcting for test-taking ability (using MCAT scores as a proxy for test-taking ability). HMR also showed that, even though there were group differences between institution and gender, those differences alone also did not explain the relationship between USMLE and clerkship honors grades. The gender differences show an additional pattern of interest, wherein males performed better on Step 1, but females had more honors grades on average. Based on this pattern, females with lower Step 1 scores relative to males may actually achieve similar clinical performance. Additional caution may be warranted when comparing Step scores across genders.

While these data do not necessarily predict future clinical competency as a physician, the data do indicate that the USMLE scores are associated with clinical performance as a medical student. These results are consistent with previous findings showing that MCAT scores are predictive of which students are more likely to be offered acceptance to the Alpha Omega Alpha national medical honors society,17 and with the results from analyses of the Canadian medical licensing exam showing a relationship between examination scores and performance.911 On the other hand, our data are in contrast to the report by McGahie et al which found very weak to no association of USMLE scores with specific clinical procedures such as thoracentesis and central venous catheter insertion.8 However, these procedures constitute very specific and narrow areas of clinical competency and do not represent what many clinicians do in daily practice, nor do they assess a student’s ability to provide an accurate differential diagnosis or treatment plan.

Our results are limited in that they include data on only two institutions. To improve generalizability, data from additional institutions should be examined. Furthermore, our data are observational, and therefore we cannot draw conclusions regarding the cause of the relationships we found. An additional limitation is that honors grades can be used to identify top-performing students, but cannot be used as a means of exploring the relationship between USMLE scores and clinical performance over the entire range of possible performance, as honors grades do not provide data on students with low to moderate performance.

Ultimately, it is very likely that many other competencies and characteristics, besides USMLE scores, are important in becoming an excellent physician. The case for holistic review of the individual has been endorsed by many in the medical education field,18 including the Association of American Medical Colleges.19 Our purpose in this study was not to argue that USMLE scores should be the sole or even necessarily a major factor in the residency match process, but merely to provide additional data to address the specific lack of literature surrounding the relationship between USMLE scores and clinical performance in medical students. Our findings suggest that USMLE scores do indeed have a positive relationship with clinical performance, which should be explored in further study.

Acknowledgments

The authors wish to thank the Medical Education Outcomes Center at the University of Minnesota Medical School and Matthew Edwards, registrar at the University of Iowa Carver College of Medicine, for their support in the data collection portion of this study. The University of Minnesota and the University of Iowa provided funding for the salaries of the researchers.

Author contributions

All authors contributed to data analysis, drafting and revising the article, gave final approval of the version to be published, and agree to be accountable for all aspects of the work.

Disclosure

The authors report no conflicts of interest in this work.

References

1. Gauer JL, Jackson JB. The association of USMLE step 1 and step 2 CK scores with residency match specialty and location. Med Educ Online. 2017;22:1358579. doi:10.1080/10872981.2017.1358579

2. Prober CG, Kolars JC, First LR, Melnick DE. A plea to reassess the role of United States Medical Licensing Examination Step 1 scores in residency selection. Acad Med. 2016;91(1):12–15. doi:10.1097/ACM.0000000000000855

3. Moynahan KF. The current use of United States Medical Licensing Examination Step 1 scores: holistic admissions and student well-being are in the balance. Acad Med. 2018;93(7):963–965. doi:10.1097/ACM.0000000000002101

4. Chen DR, Priest KC, Batten JN, Fragoso LE, Reinfield BI, Laitman BM. Student perspectives on the “Step 1 climate” in preclinical medical education. Acad Med. 2019. Published online ahead of print. doi:10.1097/ACM.0000000000002565

5. Katsufrakis PJ, Chaudhry HJ. Improving residency selection requires close study and better understanding of stakeholder needs. Acad Med. 2019. Published online ahead of print. doi:10.1097/ACM.0000000000002559

6. Cuddy MM, Young A, Gelman A, et al. Exploring the relationship between USMLE performance and disciplinary action in practice: a validity study of score inferences from a licensure examination. Acad Med. 2017;92:1780–1785. doi:10.1097/ACM.0000000000001747

7. Norcini JJ, Boulet JR, Opelek A, Dauphinee WD. The relationship between licensing examination performance and the outcomes of care by international medical school graduates. Acad Med. 2014;89:1157–1162. doi:10.1097/ACM.0000000000000310

8. McGaghie WC, Cohen ER, Wayne DB. Are United States Medical Licensing Exam Step 1 and 2 scores valid measures for postgraduate medical residency selection decisions? Acad Med. 2011;86:48–52. doi:10.1097/ACM.0b013e3181ffacdb

9. Wenghofer E, Klass D, Abrahamowicz M, et al. Doctor scores on national qualifying examinations predict quality of care in future practice. Med Educ. 2009;43:1166–1173. doi:10.1111/j.1365-2923.2009.03534.x

10. Tamblyn R, Abrahamowicz M, Brailovsky C, et al. Association between licensing examination scores and resource use and quality of care in primary care practice. JAMA. 1998;280:989–996.

11. Tamblyn R, Abrahamowicz M, Dauphinee WD, et al. Association between licensure examination scores and practice in primary care. JAMA. 2002;288:3019–3026.

12. Taylor ML, Blue AV, Mainous AG, Geesey ME, Basco WT. The relationship between the National Board of Medical Examiners’ prototype of the Step 2 clinical skills exam and interns’ performance. Acad Med. 2005;80:496–501.

13. Cuddy MM, Swanson DB, Clauser BE. A multilevel analysis of examinee gender and USMLE Step 1 performance. Acad Med. 2008;83(10 Suppl):S58–S62. doi:10.1097/ACM.0b013e318183cd65

14. Cuddy MM, Swanson DB, Dillon GF, Holtman MC, Clauser BE. A multilevel analysis of examinee characteristics and the United States Medical Licensing Examination Step 2 Clinical Knowledge performance: revisiting old findings and asking new questions. Acad Mec. 2006;81(Suppl 10):S103–S107. doi:10.1097/00001888-200610001-00026

15. Gauer JL, Jackson JB. Relationships of demographic variables to USMLE physician licensing exam scores: a statistical analysis on five years of medical student data. Adv Med Educ Pract. 2018;9:39–44. doi:10.2147/AMEP.S152684

16. VanVoorhis CRW, Morgan BL. Understanding power and rules of thumb for determining sample sizes. Tutor Quant Methods Psychol. 2007;3(2):43–50. doi:10.20982/tqmp.03.2.p043

17. Gauer JL, Jackson JB. Association between the medical college admission test scores and Alpha Omega Alpha Medical Honors Society membership. Adv Med Educ Pract. 2017;8:627–632. doi:10.2147/AMEP.S145839

18. Conrad SS, Addams AN, Young GH. Holistic review in medical school admissions and selection: a strategic, mission-driven response to shifting societal needs. Acad Med. 2016;91:1472–1474. doi:10.1097/ACM.0000000000001403

19. Association of American Medical Colleges (AAMC). Holistic review homepage. Available from: www.aamc.or/initiatives/holisticreview/. Accessed October 21, 2018.

Supplementary Materials

Table S1 Summary of hierarchical regression analysis for variables predicting count of honors grades received in core clinical clerkships for medical students graduating in 2013–2017 from institution A (N=790)

Table S2 Summary of hierarchical regression analysis for variables predicting count of honors grades received in core clinical clerkships for medical students graduating in 2013–2017 from institution B (N=721)

Creative Commons License © 2019 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.