Back to Journals » Clinical Interventions in Aging » Volume 21
Enhancing the Short Physical Performance Battery: Proposing Norm Values for the 4-Meter Walking Test for Multimorbid Older Adults
Authors Labott BK
, Belkin V, Wollesen B
, Voelcker-Rehage C
Received 3 January 2026
Accepted for publication 1 April 2026
Published 28 May 2026 Volume 2026:21 585052
DOI https://doi.org/10.2147/CIA.S585052
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Dr Maddalena Illario
Berit K Labott,1,2,* Vera Belkin,1,2,* Bettina Wollesen,2 Claudia Voelcker-Rehage1
1Department of Neuromotor Behavior and Exercise, University of Münster, Münster, Germany; 2Institute of Movement Therapy and Movement-Oriented Prevention and Rehabilitation, German Sport University Cologne, Cologne, Germany
*These authors contributed equally to this work
Correspondence: Claudia Voelcker-Rehage, Department of Neuromotor Behavior and Exercise, University of Münster, Wilhelm-Schickard-Str. 8, Münster, 48149, Germany, Email [email protected]
Background: The Short Physical Performance Battery (SPPB) is a common tool for examining lower extremity physical functioning and respective intervention effects in older adults. It is primarily used in community-dwelling older adults, but also in multimorbid populations such as nursing home residents. However, previous literature reported floor effects (receiving the lowest score) in this vulnerable population, particularly for the scoring of the 4-meter walking subtest.
Aim: This study aimed to propose norm values and a scoring system addressing floor effects, to compare sensitivity to performance changes between the original and revised systems, and to externally validate the revised scoring.
Methods: Data from three randomized controlled trials (PROCARE (n=399, 84.0± 7.8 years, 77% female), PROfit (n=97, 82.4± 9.8 years, 72% female), and PROGRESS (n=97, 84.6± 7.7 years, 75% female)) were re-analyzed. SPPB was administered at baseline and post-intervention. Quartiles for time to complete the 4-meter walking test at baseline of the PROCARE dataset were analyzed, and norm values and a corresponding scoring system (0– 4 points) were proposed. Floor effects were analyzed by comparing the number of participants in each scoring category. Sensitivity of the new scoring was evaluated by comparing means and standard deviations of time for completion at baseline and post-interventions. External validation was done using the PROfit and PROGRESS datasets.
Results: Based on the adjusted scoring system, a completion time faster than 5.65 seconds (old: 4.82 seconds) corresponded to the highest score (4 points), and a time exceeding 10.80 seconds (old: 8.70 seconds) to the lowest score (1 point). The proposed scoring system reduced the floor effect from 36% to 21%, and analysis of sensitivity revealed a better fit with time to completion. External validation indicated that the proposed scoring categories appropriately reflected participants’ functional and cognitive characteristics.
Conclusion: The new scoring differentiated walking performance in a multimorbid sample, particularly within the group of low performers and better displayed changes in performance. Thus, it can contribute to detecting changes due to physiological deterioration, limited mobility, and use of walking aids which otherwise may be missed in this vulnerable population.
Keywords: walking speed, geriatric assessment, long-term care, physical functional performance, nursing home residents
Introduction
Maintaining physical functioning, especially muscle strength, balance, and mobility, is essential for older adults to remain independent and to perform (instrumental) activities of daily living.1–3 One of the most important physical functions is walking, which requires all components associated with physical functioning.4 However, the ageing process is accompanied by a deterioration of muscle mass and balance leading to loss of muscle strength and instable walking patterns, which in turn affects independence.5,6 Therefore, changes in physical functioning should be monitored by a comprehensive assessment, including subdimensions of walking and lower extremity physical functioning.
The Short Physical Performance Battery (SPPB) is a widely used standardized assessment for examining lower extremity physical functioning in healthy older adults, including walking.7–9 It consists of three parts, the assessment of balance, lower extremity strength and walking speed.9 Each part is scored between 0 and 4 points, with a total possible score of between zero (low mobility/functionality) and 12 (full mobility/functionality) on the SPPB scale. The test battery was originally developed and validated for healthy community-dwelling older adults.10–12 It is proposed as a valid, reliable (concurrent validity (physical functioning): r=0.71–0.73; reliability: intraclass correlation coefficients ranging from 0.72 to 0.92, Cronbach’s alpha 0.66–0.76),13 and feasible indicator for lower extremity physical functioning in community-dwelling older adults. The SPPB goes beyond self-reporting to capture functional limitations and has been shown to assess risk of subsequent disability and nursing home admission12 as well as predict mortality.13,14 Besides being used in community-dwelling older adults, it is also administered in more vulnerable groups such as nursing home residents,15,16 where it demonstrates satisfactory validity and reliability (reliability: Cronbach’s alpha 0.863; validity: Spearman correlation (SPPB with Barthel Index) 0.853).17
However, despite these adequate psychometric properties, the SPPB faces significant limitations in this population. Multimorbid nursing home residents frequently exhibit markedly reduced gait performance, leading to pronounced floor effect. These floor effects complicate test administration by limiting score variability, reducing sensitivity to change, and increasing the proportion of participants unable to complete specific test components.18 While the SPPB’s scoring system accommodates those unable to perform its tasks, the inability to complete subtests (within time limits) increases with age,19 and results in a preponderance of 0- and 1-scores that further exacerbate floor effects. For example, in the oldest old (95+ years), 43%–57% were not able to perform the balance test,19,20 4% to 33% were not able to perform the 4-meter walking test,19,20 and the five-time sit-to-stand test could not be performed by 62%-82%.19,20 Particularly, in multimorbid or frail older adults, the 0-scoring and the wide time range for the 1-point category presents limitations. These floor effects give rise to three key measurement properties. First, content validity is limited because extreme items at the lower end of the scale are missing, making it difficult to differentiate between low performers.21 Second, reliability is reduced as participants with the lowest score cannot be distinguished from one another.21 Third, and most critically for intervention research, responsiveness to targeted interventions is limited as small but meaningful changes in low-performing participants cannot be detected.21 However, the SPPB is a well-established geriatric assessment and frequently used in exercise intervention studies for (multimorbid) old adults to determine intervention effects,22 despite a recent review indicating that the SPPB is not sensitive to changes in all older age groups.23
Among the three SPPB subtests, gait speed is widely regarded as being of clinical importance in frail and institutionalized older adults. It has been shown to have strong prognostic value for negative outcomes such as functional decline, hospitalization, and mortality.24–26 Due to its prognostic relevance and higher completion rates compared to the other SPPB subtests, the gait test was deemed the best candidate for further analysis. However, the chair rise test cannot be administered to individuals who cannot stand independently, and the balance test does not provide the necessary continuous raw timing data for refining the norms. Therefore, this study aimed to propose revised normative values for the 4-meter walking test for multimorbid older adults, using a sample of nursing home residents. The aim was to reduce the floor effect in the SPPB 4-meter walking test for this target group, and to improve sensitivity to change. The revised normative values were then validated using two other datasets of nursing home residents. We hypothesize that the walking times in the new scoring system differ from the original values proposed by Guralnik et al7 for healthy, community-dwelling older adults. This could reduce the floor effect for nursing home residents. Also, we hypothesize that the new scoring system is more sensitive to change for the oldest, multimorbid cohort compared to the original scoring system by Guralnik et al9 for healthy older adults. Furthermore, we hypothesize that the validation of the proposed norm values represents the functional and cognitive characteristics of the different samples.
Methods
Study Designs
This brief report presents an analysis of the SPPB data from three datasets: one monocenter (PROGRESS,15 complete dataset) and two multicenter (PROfit,3 partial dataset and PROCARE,27 complete dataset) randomized controlled trials (RCTs) in nursing homes. While our previous publication presented the study protocol for PROGRESS,15 the current work presents the first empirical analysis of SPPB data from this trial combined with data from two additional multicenter RCTs. The studies are registered at DRKS.de (PROGRESS: DRKS00031020, February 23, 2023; PROfit: DRKS00021423, April 16, 2020; PROCARE: DRKS00014957, October 9, 2018). All three studies were conducted in agreement with the principles of the Declaration of Helsinki, the guidelines of Good Clinical Practice and were approved by ethical committees (PROGRESS: Ethics Committee of the Faculty of Psychology and Sports Science at the University of Münster, 2022–40-CVR; PROCARE: Ethics Committee of the Hamburg Chamber of Physicians, PV5762; PROfit: TU Berlin Ethics Committee, GR_14_20191217). All ethical approvals also covered the use of anonymized data for secondary analyses, this comprises ethical approval for the analysis of the SPPB data from the three datasets. The data used in the present study was privately owned, and permission was obtained from the owner of the database. The STROBE Statement28 guided this study report.
Study Population
Written informed consent was obtained from all participants prior to enrolment in the study, in accordance with the Declaration of Helsinki. For those lacking the capacity to consent due to cognitive impairment related to their illness, consent was obtained from their legal representatives. Both the participants and their representatives were informed and agreed. The PROCARE study27 (2018–2021) included 664 participants from 48 nursing homes in eight German cities. From the multi-center PROfit study3 (2020–2022) 164 participants from 6 nursing homes in Berlin were integrated. The PROGRESS study15 (2022–2025) included 124 participants from seven nursing homes in Münster, Germany. The inclusion criteria for all projects were: i) willingness to participate; ii) the ability to participate in group activities; and iii) the ability to understand and follow simple instructions. Participants had to be able to sit on a chair or in a wheelchair without assistance.3,15,27 For the PROfit study, only participants with walking ability,3 and for the PROGRESS project, a minimum age of 60 years15 were additional inclusion criteria. At baseline, data from a total number of 593 participants were included.
Measures
Demographics, including age (years), sex, anthropometrics (body height (m), body mass (kg), as well as the Barthel Index (scores between 0–100)29 and the Montreal Cognitive Assessment (MoCA, scores between 0 and 30),30 were collected at baseline and used to characterize the sample in the current re-analysis.
The Short Physical Performance Battery (SPPB)9 was examined at baseline and post-intervention in each study. The SPPB consists of three subtests (balance, gait, and leg strength), and each subtest is scored between 0 and 4 points. Total SPPB scores range from 0 (low mobility/functionality) to 12 (full mobility/functionality). First, participants are asked to complete the balance subtest comprising three subtasks (standing in closed stance, semi-tandem stance, and tandem stance for up to ten seconds each). The score range is 0–4 points. If participants are unable to hold the closed stance for 10 seconds without support, the balance test ends, and they receive 0 points. If successfully completed, participants receive 1 point and proceed to the semi-tandem stance. If they cannot hold the semi-tandem stance for ten seconds, they receive 0 points, and the balance test ends. If successfully completed, participants receive 1 point and continue with the tandem stance. They receive 0 points when they hold it for less than 3 seconds, 1 point when held between 3 and below 10 seconds and 2 points when held for 10 or more seconds. Walking speed is measured over a 4-meter distance, and the time it takes participants to complete the track is recorded in seconds. Participants stand on a line and are instructed to walk down the track at their usual speed. Here, they were instructed to walk as if they were walking down an aisle in a nursing home and, if necessary, they were allowed to use their mobility aid. Participants receive 4 points for completing the 4-meter walk in less than 4.82 seconds, 3 points for completing it between 4.82 and 6.20 seconds, 2 points for completing it between 6.21 and 8.70 seconds, and 1 point when they need more than 8.71 seconds. Participants who were unable to complete the walking test or walk received 0 points. Finally, the five-times-sit-to-stand test to measure leg strength is assessed.9 First, participants perform a trial run in which they stand up from a chair with their arms crossed over their chest. If they are unable to do so, the test ends. Those who cannot complete the test or need more than 60 seconds receive 0 points, they receive 1 point for times between 16.70 and 60 seconds, 2 points for times between 13.70 and 16.69 seconds, 3 points for times between 11.20 and 13.69 seconds, and 4 points of 11.19 seconds or less.
In the three studies, walking speed was measured differently in terms of the number of runs and the measurement technique used. PROGRESS measured it only once, alongside a gait analysis, by using a stopwatch and the eight-meter-long GAITRite® system.15 The GAITRite speed was only used if an error occurred during the stopwatch measurement. PROfit performed the measurements using a stopwatch,3 and PROCARE recorded two trials but used either a stopwatch or derived the time from a gait analysis system (GAITRite® or OptoGait®)16 (for further details cf. study protocols3,15,27). Nevertheless, all measurements adhered to the original SPPB administration protocol as described by Guralnik et al9 and all assessors in the three studies have been trained accordingly.
Interventions
All RCTs implemented a multi-component exercise intervention with slightly different content and foci. PROCARE comprised 32 sessions (twice weekly, 45–60 minutes, 16 weeks in total) targeting physical function, cognition, and psychosocial well-being. PROfit utilized three modified PROCARE interventions that incorporated spatial orientation tasks and location changes (24 sessions, twice a week, 45–60 minutes, 12 weeks in total). PROGRESS combined elements of PROCARE and PROfit by integrating spatial orientation tasks with one location change. By using a cross-over design four different intervention arms (exercise intervention, guided environmental intervention, non-guided environmental intervention and physical-activity promoting culture) were conducted over 16 weeks (32–64 sessions, twice a week, 45–60 minutes).
Potential Source of Bias
Due to the integration of three projects (PROCARE, PROfit, and PROGRESS), several potential sources of bias must be acknowledged and addressed.
Selection and Temporal Bias
The three included studies were conducted across different time periods (start PROCARE: 2017, PROfit: 2020, PROGRESS: 2022) and in different cities across Germany.
Measurement and Information Bias
Not all study centers recorded the time of the 4-meter walking test using the same measurement equipment, that potentially led to systematic measurement differences. However, measurements from the GAITrite® and OptoGait systems have been shown to be highly consistent.31 The reporting and documenting of the use of mobility aids lacked standardization across projects, with inconsistent approaches to report 0-scoring versus missing value classification (eg., participants unable to walk were scored either 0 or missing value).
Data Analysis
To determine the differences among participants in the three included studies, characteristics were tested using a chi-square test for sex or a Levene test for homogeneity of variance, followed by a one-way analysis of variance (ANOVA). If homogeneity of variances was not present, a Welch-ANOVA was used (age, Barthel Index, time to complete the 4-meter walking test). Due to the absence of homogeneity of variances, post-hoc testing included the Games-Howell test. The level of significance was set at p<0.05. Effect sizes were reported as partial eta squared (ηp2) and interpreted as small (0.01), medium (0.06), or large (0.14) effects.32
Development and Validation of the Adapted Scoring System
The presence of floor effects in the 4-meter walking test was evaluated based on the percentage of participants who achieved the lowest possible score when completing the test (1 point). We chose to take the lowest possible score for completion into account, as our goal was to adjust the scoring system for those who can complete the test.
Following the initial scoring system by Guralnik et al9 we determined quartiles for walking speed in the present population of nursing home residents to propose a new scoring system for this specific multimorbid population. As the new cut-off values were based on quartile distributions, the largest dataset (PROCARE, n = 399) was used to develop the adapted 4-point scoring system and to ensure more stable and less sample-specific threshold estimations.
According to the literature, a floor effect appears when 15–20% or more of the participants score in the lowest category.21,33,34 Although a quartile-based approach categorizes 25% of participants as the lowest, this method was chosen to create population-specific norms that more accurately reflect the functional range observed in multimorbid nursing home residents. This improves the scale’s ability to discriminate within this population.
The two smaller independent datasets (PROfit and PROGRESS; n = 97 each) were subsequently used for external validation. This approach allowed the adapted scoring system to be tested in samples differing from the development sample with regard to relevant clinical and functional characteristics. The cut-off values derived in the PROCARE-dataset were applied unchanged to the PROfit- and PROGRESS-datasets in order to examine the robustness and transportability of the proposed scoring system across independent samples with differing levels of functional and cognitive impairment. Validation focused on whether the adapted scoring system reduced the pronounced floor effects observed with the original scoring approach while maintaining meaningful differentiation across the functional spectrum.
For the sensitivity analysis comparing our new scoring system with Guralnik et al9 complete cases were used (n = 348). We calculated descriptive statistics (means and standard deviations) for both scoring systems at baseline- and post-intervention. This descriptive comparison aimed to illustrate how each scoring system captures changes in walking performance. Statistical significance testing was not performed, as this analysis focuses on demonstrating the scoring distribution and sensitivity to change rather than testing intervention effects.
Results
A total of 952 participants were integrated in the three studies. Of these, 359 had missing data because they were unable to participate in the test, either because they did not want to or because they were unable to walk. A total of 593 participants had a timed 4-meter walking test at baseline and were analyzed for revised norm values. Out of these participants, 348 fulfilled the 4-meter walking test at both baseline and post-intervention. The participant flow is shown in Figure 1.
|
Figure 1 Flow chart of participants. |
The characteristics of all included participants, both separately for each study and collectively, are presented in Table 1. These include demographic information such as age, sex, and anthropometrics (body height (m), body mass (kg)), Barthel Index (scores between 0–100),29 and Montreal Cognitive Assessment (MoCA, scores between 0 and 30),30 as well as time to complete the 4-m walking test. Post-hoc comparisons revealed that there was a significant difference between the studies, with the PROfit sample being taller compared to the PROGRESS sample. Furthermore, the PROfit sample had significantly higher scores in the Barthel Index compared to both other groups. MoCA scores were significantly higher in the PROfit than in the PROCARE sample (Table 1).
|
Table 1 Characteristics of the Participants (M ±SD) |
Time to Complete the 4-meter Walking Test
Time to complete the 4-meter walking test was on average 8.96 (±5.42) seconds, with the fastest time being 2.20 seconds and the slowest time 46.00 seconds. Participants in the PROCARE project needed on average 9.11 (±5.60) seconds, those in the PROfit project 7.26 (±3.42) seconds, and those in the PROGRESS project 10.03 (±5.95) seconds. The Welch-ANOVA was calculated to check for group differences, as neither normal distribution nor homogeneity of variances was given. It revealed statistically significant differences between groups (F (2, 203.355) = 11.803, p ≤0.001, ηp2 =0.104) and the post-hoc testing using the Games-Howell test revealed significantly slower mean walking speed for the PROCARE and PROGRESS than PROfit sample. The distribution of all completion times of the participants (n = 593) at baseline and the boxplot are displayed in Figure 2A. The baseline and post-intervention completion times are shown in Figure 2B.
Quartiles for Walking Speed in Multimorbid Nursing Home Residents
The quartiles of the PROCARE population for walking speed for the 4-meter walking test and the corresponding scoring, together with the norm values of Guralnik et al,7,9 for the 4-meter walking test are presented in Table 2. The validation and distribution of participants in the categories of the PROfit and PROGRESS cohorts are also displayed in Table 2.
|
Table 2 Proposed Norm Values for Multimorbid Older Adults Based on the Quartiles of the PROCARE Population Compared to the Original Norm Values Reported by Guralnik et al7,9 for the 4-meter Walking Test and the Distribution of Participants |
Comparison of Scoring and Floor Effects
With regard to Table 2, the fitter PROfit cohort has more participants in scoring categories 3 and 4 than in the lower categories. In contrast, the PROGRESS cohort, which is less fit, has more participants in the lower categories than in the higher ones. When comparing the distribution of all participants in each scoring category, as proposed by quartiles of the PROCARE sample, there is a more uniform distribution of approximately 150 (137–157) participants in each scoring category for the new norm values, while in the scoring categories according to Guralnik et al7 over 200 participants fall in the lowest category and less than 100 in the highest category (Figure 3). Floor effects, which are highly present when administering the categories according to Guralnik et al7, could be on average reduced from 36% to 21% by the new approach. Figure 3 presents the flow of scoring points from Guralnik’s scoring to the new proposed scoring system.
|
Figure 3 Distribution of participants in the scoring categories. (A) Number of participants in each scoring category according to the original SPPB norms reported by Guralnik et al7 and the revised norms proposed in the present study at baseline. (B) Sankey plot for the shift of scores. |
Sensitivity Analysis of Both Scoring Systems
Complete baseline and post-intervention data were included in the sensitivity-to-change analysis. A total of 348 cases were analyzed. Comparing mean and standard deviation of both scoring systems for baseline and post measurements revealed that the average of the new scoring better displays the changes in walking time over the four meters (Table 3). Furthermore, participants scored on average higher using the new scoring when compared to the scoring of Guralnik et al.7,9
|
Table 3 Mean Values and Standard Deviation for the Scores and Time for Completion (M ± SD) |
Discussion
This study proposed revised norm values for the 4-meter walking test for multimorbid older adults based on a quartile-derived scoring approach developed using the largest available sample of nursing home residents and validated using two additional independent datasets. The aim was to reduce the floor effect in the SPPB 4-meter walking test for this population and to improve sensitivity to change. The revised scoring system for the 4-meter walking test of the SPPB for multimorbid older adults offered adjusted cut-off values for the four scoring categories, based on quartiles. These differed by 0.83 to 2.10 seconds from the scoring system of Guralnik et al.7,9 Although the floor effects were not completely eliminated, as the normative values were based on quartiles, the number of participants in the lowest category was reduced, and the variance in the sample of older adults was better depicted. Thus, the revised scoring system takes the slower walking speeds in this multimorbid population into account and provides a more sensitive measure of change than the original scale.
The SPPB scoring system for the 4-meter walking test, developed by Guralnik et al7 was originally designed for community-dwelling older adults to provide information about the physical functioning of this sample. It, however, does not account for the physical and cognitive limitations associated with multimorbidity and often results in floor effects in multimorbid populations, rendering it unsuitable for distinguishing performance levels.19 The present sample exemplifies these physical limitations with a mean Barthel Index score of 73.6 points, indicating moderate dependency in daily activities. Cognitive limitations are reflected by a mean MoCA score of our sample of 15.3 points. And, as assumed, applying the original scoring system to the multimorbid cohort resulted in floor effects. The revised scoring system was validated in two samples of multimorbid older adults with varying functional and cognitive abilities, showing corresponding differences in score distribution.
These cognitive and physical limitations in multimorbid older adults mean that they generally walk slower and are not considered well-functioning. A comprehensive meta-analysis of nursing home populations reported an average walking speed of 0.48 m/s (8.42 seconds for a 4-meter walking test) and a maximum walking speed of 0.67 m/s (5.95 seconds) for nursing home residents.26 More recent individual cohort studies conducted after this meta-analysis have found similar ranges. For example, Keogh et al and Fien et al found that the habitual walking speed for multimorbid older adults, such as nursing home residents, ranges from 0.37 m/s to 0.63 m/s,35,36 which corresponds to 10.81 to 6.35 seconds for the 4-meter walking test. Our finding of an average of 8.96 seconds is consistent with both the meta-analytic estimate and the more recent studies, indicating that our sample is representative of a multimorbid population.
The original SPPB’s highest score cut-off is 4.82 seconds (0.83 m/s). With 97% of nursing home residents performing below 0.8 m/s,36 Guralnik’s scoring7,9 is thus inappropriate for this population. The findings from the development and validation samples suggest that the newly proposed scoring system may better address this limitation by incorporating slower walking speeds into the scoring categories, resulting in different cut-offs compared to the original scoring system by Guralnik et al.7,9 It also allows to capture nuances among low-performers and to provide a better separation of multimorbid individuals within the typical nursing home performance range. The SPPB has a long tradition of use in geriatric and gerontological research. It is a widely used geriatric assessment tool relating to health-related outcomes, with higher scores indicating better health status and longer survival.37,38 For example, the highest score cut-off for the 4-meter walking test in the original SPPB is 4.82 seconds (0.83 m/s), which approaches the 1.0 m/s threshold below which health-related decline risk is assumed to increase in community-dwelling populations.39 This relationship must be verified using the proposed new scoring norms.
Additionally, despite evidence of limited sensitivity to change over time, the SPPB is frequently used as an outcome measure in exercise intervention studies.23 The enhanced discrimination among lower-performing individuals provided by the proposed scoring system may help address this limitation and may improve the SPPB’s responsiveness to intervention-related changes in multimorbid cohorts. In the exploratory sensitivity analysis, the proposed scoring system demonstrated slightly improved alignment with observed baseline-to-post-intervention changes in the 4-meter walking time compared to Guralnik’s original scoring.7,9
While the reported 0.1 second improvement in mean completion times is modest and must be acknowledged as such, the primary advantage of the proposed scoring system lies in its superior discrimination among lower-performing individuals. This enhanced sensitivity may improve the SPPB’s ability to detect meaningful changes in nursing home residents and multimorbid older adults in general.
Implementing revised scoring in clinical practice may pose practical challenges, including issues of feasibility and comparability across settings and different target groups of healthy or non-healthy older adults.24 Revised cut-offs may reduce direct comparability with data recorded by use of the original scale. Therefore, pragmatic strategies such as dual reporting of original and revised scores during a transition phase may help preserve transparency while enabling improved discrimination in multimorbid nursing home populations.
Our analysis focused on individuals who could complete the gait speed test, excluding those who scored 0 points because they were unable to walk. Although these individuals cannot receive a score higher than zero on the gait speed subtest, the balance subtest could more accurately capture meaningful variation within this population because it evaluates the ability to stand independently and maintain different standing positions. This could allow for the differentiation of functional capacity, even among individuals who are unable to walk.
So far, we have only adjusted one subtest of the SPPB for nursing home residents, the 4-meter walking subtest, by changing the norm values for the scoring but keeping the 5-point scoring (0–4 points). This could create inconsistency within the whole SPPB but as the scoring between 0 and 4 did not change, the internal consistency should not be harmed. Nevertheless, to optimize all subtests for multimorbid older adults, similar adaptations are required for the balance and chair stand test. Population-specific norms for all three components would provide more uniform discrimination across performance levels, potentially improving the SPPB’s sensitivity to intervention effects and functional decline in this population. Critically, future research must examine whether a comprehensively revised SPPB retains the strong prognostic associations with mortality and adverse health outcomes established for Guralnik’s original scoring system. However, the sensitivity analysis was exploratory and aimed to examine alignment with observed changes in performance rather than establishing definitive clinical superiority. In a population with limited functional reserve, even small shifts in categorization can improve the differentiation of degrees of impairment. Nevertheless, further prospective, outcome-based studies are required to determine whether these differences have any meaningful clinical consequences.
The need for population-specific adaptations extends beyond multimorbid older adults. While our revised scoring system addresses floor effects in multimorbid populations, the original SPPB also exhibits a ceiling effect for all subtests of the SPPB in highly functioning, community-dwelling adults aged 40 years or older.40 These opposing limitations underscore that the SPPB requires tailored scoring systems for different functional levels. This is due to floor effects in multimorbid populations and ceiling effects in well-functioning groups. A one-size-fits-all approach is not suitable. In addition, ceiling effects should be systematically examined in future studies to determine whether adapted scoring approaches are also needed to improve discrimination at the higher-functioning end of performance.
It is essential to consider the relationship between the SPPB and other geriatric assessments for multimorbid older adults, like assessments of frailty (eg., Fried phenotype41) and sarcopenia (eg., European Working Group on Sarcopenia in Older People (EWGSOP2) criteria42). Although these represent distinct diagnostic constructs, both substantially overlap with core SPPB domains, particularly gait speed, lower extremity strength and physical functioning, which are central indicators in frailty41 and sarcopenia frameworks.42 Accordingly, research demonstrates fair to moderate agreement between the SPPB and Fried’s frailty assessment, and the SPPB has been suggested as a suitable screening tool for frailty in community-dwelling older adults.43 The SPPB also shows moderate diagnostic value when compared to diagnosis of sarcopenia,43 further supporting conceptual overlap. Revised norm values of the SPPB might have the potential to capture the key functional performance dimensions underlying both conditions while remaining substantially more feasible to be applied in the nursing home setting. While frailty and sarcopenia assessments often require specialized equipment, patient-reported information, or muscle mass quantification, the SPPB is much easier to apply. It is brief, safe, and can be conducted by nursing staff. Nonetheless, complementary assessments may be warranted depending on clinical objectives. Frailty diagnostics can offer additional insights into multidimensional vulnerability, including cognitive, psychological, and social domains, which may inform broader care planning. Similarly, sarcopenia assessments focusing on muscle strength or muscle quantity may be warranted when specific interventions such as nutritional supplementation strategies or targeted resistance training are planned. Clinicians should therefore recognize that different assessment tools may yield diverging diagnostic outcomes and select tools according to the specific clinical question, available resources, and feasibility within the nursing home setting.
The study sample has potential limitations that need to be addressed. Although all assessments adhered to the standardized SPPB administration protocol, minor procedural and contextual differences across the three projects (eg., setting, measurement systems, or environmental conditions) cannot be entirely excluded and may have introduced some heterogeneity. Notably, information on walking aid use was not reported consistently. However, the data from this study reflect the performance of multimorbid older adults living in nursing homes and requiring care. Participants were allowed to perform the test with and without assistive devices to capture the mobility limitations that are characteristic of this vulnerable population. The participant heterogeneity helps us to achieve our study’s objective of developing a more inclusive scoring system that reflects the variability of multimorbid older adults in the real world. There were significant differences in walking speeds between the three cohorts, with PROfit participants performing faster than those in PROCARE and PROGRESS. Beyond potential differences in physical fitness between cohorts, the faster gait speed observed in the PROfit participants may also be attributable to differences in body height. The difference in physical fitness may be due to the inclusion criteria for PROfit, which required participants to be able to walk. These stricter criteria resulted in a sample that had higher baseline functional scores, and better cognitive functioning. These characteristics facilitate faster walking speeds44 and the better performance was displayed by a higher proportion of participants receiving a high score using the proposed scoring system. By contrast, the inclusion criteria employed by PROGRESS were broader and did not exclude individuals with limited mobility. This means that participants with lower functional and cognitive abilities, who walk more slowly, were included. The validation with these diverse samples strengthens our findings. The proposed scoring system demonstrated robust discriminative ability across this heterogeneous sample while maintaining sensitivity at lower performance levels, suggesting it is applicable to nursing home residents with varying physical and cognitive functioning levels. These population-specific norms address the floor effects identified in the original SPPB while maintaining clinical utility for multimorbid older adults.
Conclusion
The SPPB is a widely used test for evaluating lower extremity performance in older adults, particularly those living independently or in community-dwelling settings. However, the original scoring system for the 4-meter walking test often results in significant floor effects when applied to frail and multimorbid populations, such as nursing home residents. This limits its usefulness for clinical and research purposes. We developed a new scoring system for the 4-meter walking test that is specifically tailored to this vulnerable group. It is based on quartile-derived normative data from the largest sample and has been validated in two other independent samples. The adjusted system with wider intervals (4 points: < 5.65 s, 3 points: 5.65–7.58 s, 2 points: 7.59–10.80 s, 1 point: >10.80 s) appears to reduce floor effects, and may provide better discrimination at lower performance levels, while accounting for the use of walking aids. Initial external validation was performed on two independent samples, but further validation in additional samples of nursing home residents is needed to confirm the generalizability and clinical utility of this scoring system. Further research is needed to confirm their sensitivity to longitudinal changes, to evaluate their association with clinically relevant outcomes, and to also investigate ceiling effects. Furthermore, the scoring systems for the other two SPPB subtests should be similarly adjusted for this population to enable comprehensive functional assessment. If further validated, the adapted scoring system may assist clinicians, researchers, and policymakers in identifying functional decline more precisely and in evaluating intervention-related changes and rehabilitation programs for this vulnerable population.
Statement on Use of AI Tools
The authors used deepL write (www.deepl.com) to help express content in a non-native language.
Data Sharing Statement
The data that support the findings of this study are available from the corresponding author, C.V.-R. and B.W., upon reasonable request.
Acknowledgments
We would like to thank all the project members, research group members, technicians, coaches, and student members who helped with the implementation, analysis, and training of the projects. Our thanks also go to all the nursing homes, their residents, and their employees who were involved.
Author Contributions
Berit K. Labott and Vera Belkin shared first authorship. All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.
Funding
The health insurance provider Techniker Krankenkasse (Bramfelder Str. 140, 22305 Hamburg, Germany) is supporting the study. The trial is part of the PROGRESS project. The PROfit and PROCARE projects were also supported by Techniker Krankenkasse. The funding body had no input into the study’s design, the collection, analysis, and interpretation of the data, and the writing of the manuscript. The trial data was evaluated independently of the trial sponsors.
Disclosure
There are no other financial or non-financial competing interests. The authors declare that they have no conflict of interest. The study protocol was not peer-reviewed by the funding body.
References
1. Aartolahti E, Lonnroos E, Hartikainen S, Hakkinen A. Long-term strength and balance training in prevention of decline in muscle strength and mobility in older adults. Aging Clin Exp Res. 2020;32(1):59–14. doi:10.1007/s40520-019-01155-0
2. Frandin K, Gronstedt H, Helbostad JL, et al. Long-Term Effects of Individually Tailored Physical Training and Activity on Physical Function, Well-Being and Cognition in Scandinavian Nursing Home Residents: a Randomized Controlled Trial. Gerontology. 2016;62(6):571–580. doi:10.1159/000443611
3. Wollesen B, Fricke M, Jansen CP, et al. A three-armed cognitive-motor exercise intervention to increase spatial orientation and life-space mobility in nursing home residents: study protocol of a randomized controlled trial in the PROfit project. BMC Geriatr. 2020;20(1):437. doi:10.1186/s12877-020-01840-0
4. Benz T, Lehmann S, Sandor PS, Angst F. Relationship between subjectively-rated and objectively-tested physical function across six different medical diagnoses. J Rehabil Med. 2023;55:jrm9383. doi:10.2340/jrm.v55.9383
5. Tieland M, Trouwborst I, Clark BC. Skeletal muscle performance and ageing. J Cachexia Sarcopenia Muscle. 2018;9(1):3–19. doi:10.1002/jcsm.12238
6. Visser M, Goodpaster BH, Kritchevsky SB, et al. Muscle Mass, Muscle Strength, and Muscle Fat Infiltration as Predictors of Incident Mobility Limitations in Well-Functioning Older Persons. J Gerontol Ser A. 2005;60(3):324–333. doi:10.1093/gerona/60.3.324
7. Guralnik JM, Ferrucci L, Pieper CF, et al. Lower extremity function and subsequent disability: consistency across studies, predictive models, and value of gait speed alone compared with the short physical performance battery. J Gerontol a Biol Sci Med Sci. 2000;55(4):M221–31. doi:10.1093/gerona/55.4.m221
8. Guralnik JM, LaCroix AZ, Abbott RD, et al. Maintaining mobility in late life. I. Demographic characteristics and chronic conditions. Am J Epidemiol. 1993;137(8):845–857. doi:10.1093/oxfordjournals.aje.a116746
9. Guralnik JM, Simonsick EM, Ferrucci L, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994;49(2):M85–94. doi:10.1093/geronj/49.2.m85
10. Crombie KM, Leitzelar BN, Almassi NE, Mahoney JE, Koltyn KF. The Feasibility and Effectiveness of a Community-Based Intervention to Reduce Sedentary Behavior in Older Adults. J Appl Gerontol. 2022;41(1):92–102. doi:10.1177/0733464820987919
11. Zhang Y, Zhang Y, Du S, Wang Q, Xia H, Sun R. Exercise interventions for improving physical function, daily living activities and quality of life in community-dwelling frail older adults: a systematic review and meta-analysis of randomized controlled trials. Geriatr Nurs. 2020;41(3):261–273. doi:10.1016/j.gerinurse.2019.10.006
12. Gawel J, Vengrow D, Collins J, Brown S, Buchanan A, Cook C. The short physical performance battery as a predictor for long term disability or institutionalization in the community dwelling population aged 65 years old or older. Phys Ther Rev. 2012;17(1):37–44. doi:10.1179/1743288X11Y.0000000050
13. Freiberger E, de Vreede P, Schoene D, et al. Performance-based physical function in older community-dwelling persons: a systematic review of instruments. Age Ageing. 2012;41(6):712–721. doi:10.1093/ageing/afs099
14. Mijnarends DM, Meijers JM, Halfens RJ, et al. Validity and reliability of tools to measure muscle mass, strength, and physical performance in community-dwelling older people: a systematic review. J Am Med Dir Assoc. 2013;14(3):170–178. doi:10.1016/j.jamda.2012.10.009
15. Belkin V, Janssen TI, Rudisch J, Wollesen B, Voelcker-Rehage C. Prevention in nursing home residents: a cluster randomized controlled trial on the effects of exercise and environmental interventions on physical activity behavior and life space mobility (PROGRESS study protocol). Front Aging. 2025;6. doi:10.3389/fragi.2025.1466315
16. Wollesen B, Schott N, Klotzbier T, et al. Cognitive, physical and emotional determinants of activities of daily living in nursing home residents-a cross-sectional study within the PROCARE-project. Eur Rev Aging Phys A. 2023;20(1):17. doi:10.1186/s11556-023-00327-2
17. Santamaria-Pelaez M, Gonzalez-Bernal JJ, Da Silva-Gonzalez A, et al. Validity and Reliability of the Short Physical Performance Battery Tool in Institutionalized Spanish Older Adults. Nurs Rep. 2023;13(4):1354–1367. doi:10.3390/nursrep13040114
18. Cress ME, Gondo Y, Davey A, Anderson S, Kim SH, Poon LW. Assessing physical performance in centenarians: norms and an extended scale from the Georgia centenarian study. Curr Gerontol Geriatr Res. 2010;2010. doi:10.1155/2010/310610.
19. Melsaeter KN, Tangen GG, Skjellegrind HK, Vereijken B, Strand BH, Thingstad P. Physical performance in older age by sex and educational level: the HUNT Study. BMC Geriatr. 2022;22(1):821. doi:10.1186/s12877-022-03528-z
20. Sverdrup K, Bergh S, Selbaek G, Roen I, Kirkevold O, Tangen GG. Mobility and cognition at admission to the nursing home - a cross-sectional study. BMC Geriatr. 2018;18(1):30. doi:10.1186/s12877-018-0724-4
21. Terwee CB, Bot SD, de Boer MR, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42. doi:10.1016/j.jclinepi.2006.03.012
22. Zhu Y, Zhang Y, Li X, Du Z. Effects of exercise interventions on physical function, cognitive function and quality of life of frail older adults in nursing homes: a systematic review and meta-analysis. Systematic Review. Frontiers in Psychology. 2025;16. doi:10.3389/fpsyg.2025.1679734.
23. Kameniar K, Mackintosh S, Van Kessel G, Kumar S. The Psychometric Properties of the Short Physical Performance Battery to Assess Physical Performance in Older Adults: a Systematic Review. J Geriatric PhysTher. 2024;47(1):43–54. doi:10.1519/jpt.0000000000000337
24. Proctor E, Silmere H, Raghavan R, et al. Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda. Adm Policy Ment Health. 2011;38(2):65–76. doi:10.1007/s10488-010-0319-7
25. Studenski S, Perera S, Patel K, et al. Gait speed and survival in older adults. JAMA. 2011;305(1):50–58. doi:10.1001/jama.2010.1923
26. Kuys SS, Peel NM, Klein K, Slater A, Hubbard RE. Gait speed in ambulant older people in long term care: a systematic review and meta-analysis. J Am Med Dir Assoc. 2014;15(3):194–200. doi:10.1016/j.jamda.2013.10.015
27. Cordes T, Bischoff LL, Schoene D, et al. A multicomponent exercise intervention to improve physical functioning, cognition and psychosocial well-being in elderly nursing home residents: a study protocol of a randomized controlled trial in the PROCARE (prevention and occupational health in long-term care) project. BMC Geriatr. 2019;19(1):369. doi:10.1186/s12877-019-1386-6
28. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61(4):344–349. doi:10.1016/j.jclinepi.2007.11.008
29. Mahoney FI, Barthel DW. Functional evaluation: The Barthel Index: A simple index of independence useful in scoring improvement in the rehabilitation of the chronically ill. Maryland State Medical J. 1965;14:61–65.
30. Nasreddine ZS, Phillips NA, Bédirian V, et al. The Montreal Cognitive Assessment, MoCA: a Brief Screening Tool For Mild Cognitive Impairment. J Am Geriatr Soc. 2005;53(4):695–699. doi:10.1111/j.1532-5415.2005.53221.x
31. Rudisch J, Jöllenbeck T, Vogt L, et al. Agreement and consistency of five different clinical gait analysis systems in the assessment of spatiotemporal gait parameters. Gait Posture. 2021;85:55–64. doi:10.1016/j.gaitpost.2021.01.013
32. Cohen J. Statistical Power Analysis for the Behavioral Sciences.
33. Blum L, Korner-Bitensky N. Usefulness of the Berg Balance Scale in stroke rehabilitation: a systematic review. Phys Ther. 2008;88(5):559–566. doi:10.2522/ptj.20070205
34. McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4(4):293–307. doi:10.1007/BF01593882
35. Fien S, Henwood T, Climstein M, Rathbone E, Keogh JWL. Gait Speed Characteristics and Their Spatiotemporal Determinants in Nursing Home Residents: a Cross-Sectional Study. J Geriatric PhysTher. 2019;42(3):E148–E154. doi:10.1519/jpt.0000000000000160
36. Keogh JW, Senior H, Beller EM, Henwood T. Prevalence and Risk Factors for Low Habitual Walking Speed in Nursing Home Residents: an Observational Study. Arch Phys Med Rehabil. 2015;96(11):1993–1999. doi:10.1016/j.apmr.2015.06.021
37. Holanda CMA, Nóbrega PVN, Maciel ÁCC. Physical performance as a predictor of mortality in nursing home residents: a five-year survival analysis. Geriatric Nurs. 2022;47:151–158. doi:10.1016/j.gerinurse.2022.07.002
38. Western MJ, Malkowski OS. Associations of the Short Physical Performance Battery (SPPB) with Adverse Health Outcomes in Older Adults: a 14-Year Follow-Up from the English Longitudinal Study of Ageing (ELSA). Int J Environ Res Public Health. 2022;19(23):16319. doi:10.3390/ijerph192316319
39. Cesari M, Kritchevsky SB, Penninx BWHJ, et al. Prognostic Value of Usual Gait Speed in Well-Functioning Older People—Results from the Health, Aging and Body Composition Study. J Am Geriatr Soc. 2005;53(10):1675–1680. doi:10.1111/j.1532-5415.2005.53501.x
40. Bergland A, Strand BH. Norwegian reference values for the Short Physical Performance Battery (SPPB): the Tromsø Study. BMC Geriatr. 2019;19(1):216. doi:10.1186/s12877-019-1234-8
41. Fried LP, Tangen CM, Walston J, et al. Frailty in Older Adults: evidence for a Phenotype. J Gerontol Ser A. 2001;56(3):M146–M157. doi:10.1093/gerona/56.3.M146
42. Cruz-Jentoft AJ, Bahat G, Bauer J, et al. Sarcopenia: revised European consensus on definition and diagnosis. Age Ageing. 2019;48(1):16–31. doi:10.1093/ageing/afy169
43. Perracini MR, Mello M, de Oliveira Máximo R, et al. Diagnostic Accuracy of the Short Physical Performance Battery for Detecting Frailty in Older People. Phys Ther. 2020;100(1):90–98. doi:10.1093/ptj/pzz154
44. Kasović M, Štefan L, Štefan A. Normative Data for Gait Speed and Height Norm Speed in ≥ 60-Year-Old Men and Women. Clin Interventions Aging. 2021;16:225–230. doi:10.2147/CIA.S290071
© 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The
full terms of this license are available at https://www.dovepress.com/terms
and incorporate the Creative Commons Attribution
- Non Commercial (unported, 4.0) License.
By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted
without any further permission from Dove Medical Press Limited, provided the work is properly
attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.
