Back to Journals » Neuropsychiatric Disease and Treatment » Volume 15

Assessing effectiveness of aripiprazole lauroxil vs placebo for the treatment of schizophrenia using number needed to treat and number needed to harm

Authors Citrome L, Du Y, Weiden PJ

Received 7 March 2019

Accepted for publication 12 August 2019

Published 12 September 2019 Volume 2019:15 Pages 2639—2646


Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 3

Editor who approved publication: Dr Roger Pinder

Download Article [PDF] 

Leslie Citrome,1 Yangchun Du,2 Peter J Weiden3

1Psychiatry and Behavioral Sciences, New York Medical College, Valhalla, NY, USA; 2Biostatistics, Alkermes, Inc., Waltham, MA, USA; 3Medical Affairs, Alkermes, Inc., Waltham, MA, USA

Correspondence: Leslie Citrome 11 Medical Park Drive, Suite 106, Pomona, NY 10970, USA
Tel +1 845 362 2081
Email [email protected]

Objective: Schizophrenia clinical trials commonly measure observed changes in Positive and Negative Syndrome Scale (PANSS) total score. However, it is more intuitive to think of response vs nonresponse, a binary outcome. Assessing binary outcomes enables calculation of number needed to treat (NNT) for therapeutic outcomes, number needed to harm (NNH) for adverse outcomes, and likelihood to be helped or harmed (LHH) to demonstrate benefit/risk tradeoffs. Here, NNT, NNH, and LHH were used to evaluate the clinical usefulness of aripiprazole lauroxil in patients with an acute schizophrenia exacerbation.
Methods: Categorical efficacy and tolerability data were taken from the pivotal Phase 3 trial evaluating aripiprazole lauroxil for treatment of an acute exacerbation of schizophrenia. NNT and NNH values, with 95% CIs, were calculated in this post hoc analysis.
Results: Using the intent-to-treat population for the pooled doses of aripiprazole lauroxil (441 mg [n=196] and 882 mg [n=204] q4w), responder rates (≥30% improvement from baseline PANSS total score) were 35.3% for aripiprazole lauroxil arms vs 18.4% for placebo (n=196), yielding a NNT of 6 (95% CI: 5–11). Discontinuation rates due to adverse events (AEs) were higher among patients randomized to placebo than to either aripiprazole lauroxil dose. Akathisia was the only AE with an incidence ≥5% in each aripiprazole lauroxil group and at least twice that of placebo (11.6%, 11.5%, and 4.3% of the patients receiving aripiprazole lauroxil 441 mg, 882 mg, and placebo, respectively), producing a NNH of 14 (95% CI: 9–33) for pooled aripiprazole lauroxil doses vs placebo. Calculating LHH for therapeutic response vs akathisia, aripiprazole lauroxil was 2.3 times more likely to result in a therapeutic response than an incident of akathisia.
Conclusion: Using metrics of NNT, NNH, and LHH, aripiprazole lauroxil was an efficacious and well-tolerated intervention in a pivotal study in patients with an acute schizophrenia exacerbation.

Keywords: aripiprazole lauroxil, long-acting injectable, psychotic disorders, antipsychotic agents, number needed to treat, effect size


Aripiprazole lauroxil (AL) is a long-acting injectable formulation of aripiprazole that is FDA-approved for the treatment of schizophrenia in adults based on the results of a 12-week, randomized, double-blind, placebo-controlled study in more than 600 patients with acute exacerbation of their illness.14 In the pivotal study,1 AL 441 and 882 mg administered every 4 weeks provided statistically significant and clinically meaningful improvements in schizophrenia symptoms at study endpoint compared with placebo, as measured by Positive and Negative Syndrome Scale (PANSS) total score reductions. In that study, initiation of an AL dosing regimen required oral aripiprazole supplementation for the first 21 days after the initial injection.1

Although the primary efficacy outcome measure in clinical trials for schizophrenia is commonly reported as the change observed in PANSS total score, a point change on a rating scale can be difficult to clinically interpret and subsequently apply to treatment decision-making. Categorical/binary outcomes such as response vs nonresponse are more intuitive. To more fully understand the clinical relevance of statistically significant results of clinical trials, several metrics are available that quantify absolute effect sizes of a given treatment to better inform health care providers of the benefits and risks of that treatment. Number needed to treat (NNT) and number needed to harm (NNH) are measures of effect size and indicate how many patients would need to be treated with one agent instead of a comparator to encounter one additional outcome of interest.5 Lower NNTs are evidenced when there are large differences between the interventions under evaluation. For example, a NNT of 2 would be a very large effect size, as a difference is expected to be encountered after treating just two patients with one intervention vs the other. A NNT of 50 would indicate little difference between the two interventions, as it would require treating 50 patients to expect to observe a difference in the outcome. A negative NNT denotes an advantage for the comparator regarding the potential benefit.

NNH is used when referring to undesirable events associated with treatment. In general, a useful medication is one with a low NNT and a high NNH compared with another intervention. A low NNT and a high NNH would mean an individual patient is more likely to encounter a benefit than a harm. Although these values can vary by treatment indication, a rule of thumb is that single-digit NNTs (i.e., <10) for efficacy measures suggest that the intervention has potentially useful advantages and that double-digit or higher NNHs (i.e., ≥10) for adverse outcomes indicate that the intervention is potentially tolerable.5 A negative NNH denotes an advantage for the study medication regarding the potential harm.

The likelihood to be helped or harmed (LHH) is the ratio of NNH to NNT and, in general, a LHH >1 would mean the likelihood to be helped is greater than the likelihood to be harmed. For a LHH <1, the reverse is true. Choosing which NNH and NNT to use in calculating LHH requires careful consideration so that the outcomes being assessed are well matched and consistent with a patient’s values and preferences.57

In contrast to NNT, Cohen’s d8 is a metric that defines effect size for continuous (not categorical) variables, such as the PANSS total score. Measured in standard deviation units, Cohen’s d is not clinically intuitive for most practitioners but is commonly described in research reports.

The aim of this post hoc analysis is to use these metrics (i.e., NNT, NNH, LHH, and Cohen’s d) to assess the evidence base supporting the use of AL as a treatment for schizophrenia in order to place this intervention into clinical perspective.


The study was conducted in accordance with the Declaration of Helsinki, 1964, and Good Clinical Practice principles (International Conference on Harmonisation, 1997). The study protocol, all amendments, and informed consent documents were approved by a qualified institutional review board at each study site (all institutional review boards are listed in Table S1), and all participants completed written informed consent before participating in any study procedures.

Data sources

Data for this post hoc analysis were collected during the randomized, double-blind, placebo-controlled, Phase 3, pivotal study of AL in patients experiencing an acute exacerbation of schizophrenia (Aristada, Alkermes, Inc., Waltham, MA, USA; registration number: NCT01469039).13

Data extraction

Treatment response was evaluated using categorical efficacy outcomes extracted from the dataset using several PANSS thresholds. First, the proportion of patients in each treatment arm meeting the criterion of a ≥20%, ≥30%, ≥40%, or ≥50% reduction from baseline to endpoint on PANSS total score was determined. In addition, the time course of treatment response was evaluated by calculating the proportion of patients in each treatment arm meeting the criterion of a ≥30% reduction on PANSS total score at study days 8, 15, 22, 29, 57, and 85.

For safety and tolerability, adverse events (AEs) occurring at any time during the treatment period were recorded. Other safety and tolerability outcomes included the proportion of patients who discontinued due to an AE and due to an AE other than “schizophrenia” or “psychotic disorder.”

Statistical analysis

NNT and NNH, with the respective 95% CIs, for AL (each dose strength and pooled) vs placebo were computed for each outcome.9 When the NNT or NNH estimate was not statistically significant at the P<0.05 threshold (as noted when the 95% CI would contain infinity), the notation “ns” was provided. LHH values were calculated to illustrate potential tradeoffs for efficacy and tolerability outcomes. Formulae used for these calculations are presented below:

  • ARI = (incidence on medication) – (incidence on placebo) = ƒ1–ƒ2
  • The 95% CI was calculated by

  • NNT (or NNH) = 1/ARI, and rounded up to the next highest whole number
  • The CI for the NNT (or NNH) was calculated by taking the reciprocal of the lower and upper bounds of the CI for the ARI

Medication effects over time were also assessed using Cohen’s d effect size for the PANSS total score difference of AL vs placebo at study days 8, 15, 22, 29, 57, and 85. Missing PANSS total scores for patients who terminated the study early (before day 85) were imputed using the last observation carried forward.


Study population

Of 623 patients experiencing an acute exacerbation of schizophrenia randomized to placebo, AL 441 mg, or AL 882 mg every 4 weeks, 622 were analyzed for safety (all patients who received at least one dose of IM study drug) and 596 were analyzed for efficacy (all patients who received at least one dose of IM study drug and had at least one primary efficacy assessment after IM study drug). Overall patient characteristics have been described previously.1 All patients were markedly to severely ill, with mean PANSS total scores 92.6, 92.0, and 93.9 for the AL 441 mg, AL 882 mg, and placebo groups, respectively.1

Efficacy outcomes

In this post hoc analysis, individual AL doses (441 and 882 mg) evidenced NNT values <10 vs placebo for response, defined as a ≥30% decrease in PANSS total score from baseline (Figure 1; Table S2). When AL results were pooled for both doses, responder rates were 35.3% for AL vs 18.4% for placebo, yielding a NNT of 6 (95% CI: 5–11) at day 85 (Table S2). At the lowest response threshold (i.e., ≥20% reduction in PANSS total score from baseline to endpoint), NNT was more robust (NNT = 4); at higher thresholds of response (≥40% and 50%), smaller effect size estimates for NNT of response (10 and 26, respectively) were calculated vs placebo (Figure 1; Table S2). For the pooled AL dose group, the effect size for responders (i.e., PANSS total score reduction ≥30%) became significant as early as day 22 (Figure 2); NNT remained significant and was more robust at later time points (Figure 2).

Figure 1 PANSS responders (defined by PANSS total score reduction thresholds of 20–50%) with AL 441 mg, 882 mg, or AL doses pooled: NNT and 95% CI vs placebo, at endpoint. Note: *Upper bound 95% CI is 216. Abbreviations: AL, aripiprazole lauroxil; CI, confidence interval; NNT, number needed to treat; ns, not significant at the P<0.05 threshold; PANSS, Positive and Negative Syndrome Scale.

Figure 2 PANSS responders (≥30% reduction from baseline PANSS total score) by study day with AL doses pooled: NNT and 95% CI vs placebo by days on therapy. Abbreviations: AL, aripiprazole lauroxil; CI, confidence interval; NNT, number needed to treat; ns, not significant at the P<0.05 threshold; PANSS, Positive and Negative Syndrome Scale.

Cohen’s d

Effect sizes calculated using Cohen’s d, examining the difference between PANSS total score for AL vs placebo at days 8, 15, 22, 29, 57, and 85, paralleled the magnitude of the effect sizes seen with NNT for PANSS response (Table 1). Pooling the two doses of AL, the Cohen’s d for the PANSS total score change vs placebo at day 85 was 0.61 (95% CI: 0.44–0.79), representing a moderate effect size and being comparable to the NNT of 4 observed for responders defined by a ≥20% reduction in PANSS total score from baseline to endpoint.

Table 1 PANSS total score: change from baseline* and Cohen’s d for AL vs placebo (LOCF)

Tolerability outcomes

Study completion rates were higher for patients taking either dose of AL than for placebo. After pooling both doses of AL, the NNT estimate vs placebo for study completion was 6 (95% CI: 4–11) (Table 2). Examining discontinuation due to AEs, rates were higher among patients randomized to placebo, yielding a NNH estimate of −8 (95% CI: −6 to −15) for the pooled doses of AL vs placebo, indicating an advantage for AL in avoiding this outcome. When the AEs of “schizophrenia” or “psychotic disorder” were excluded from the NNH analysis as reasons for discontinuation, AE-related discontinuation rates remained higher for placebo, but the NNH estimates were no longer statistically significant.

Table 2 Rates and NNT vs placebo for study completion and NNH vs placebo for AE-related discontinuation (safety population)

Akathisia was the only AE with an incidence ≥5% in each AL group and at least twice the rate of placebo, producing a NNH of 14 (95% CI: 9–33) for pooled AL doses vs placebo (Table 3). Among the other AEs with an incidence of ≥2% that occurred more frequently in both AL groups than in the placebo group and resulted in statistically significant estimates of NNH values for pooled doses of AL vs placebo were toothache, blood creatine phosphokinase increased, and weight increased. The AEs of schizophrenia, agitation, and psychotic disorder produced NNH values that were negative and statistically significant (Table 3), suggesting an advantage for the pooled AL dose groups vs placebo in avoiding this outcome.

Table 3 AE rates (Incidence ≥2% in any treatment arm) and NNH vs placebo (safety population)


A limited number of LHH calculations could be done because when the NNT and/or NNH are not statistically significant, LHH cannot be reliably calculated. In addition, when the NNH is a negative number, LHH is rendered meaningless. Nonetheless, using the NNT for response (≥30% reduction from baseline in PANSS total score) for the pooled doses of AL vs placebo and the NNH for akathisia, the LHH was 14/6 or 2.3. Thus, treatment with AL was 2.3 times more likely to result in a therapeutic response than a complaint of akathisia. A caveat, as noted in the discussion, is that in this study, akathisia was not commonly encountered after the second or third injection. Because the rate of AE-related discontinuations was higher for placebo than for AL, it is not possible to contrast the NNT for response with the NNH for discontinuation due to an AE.


Although inferential statistics can help determine whether a given result represents a probable outcome (as opposed to one occurring by chance), effect sizes are essential in helping determine if an outcome is clinically important. An outcome may be statistically significant but clinically irrelevant. NNT (and NNH) is a useful way of assessing clinical relevance because it reflects the magnitude of a therapeutic benefit in “patient units” instead of point changes on a rating scale or a Cohen’s d, which measures effect size in standard deviation units. Kraemer and Kupfer10 propose that NNTs of 3 (rounded up from 2.3), 4 (rounded up from 3.6), and 9 (rounded up from 8.9) correspond to a Cohen’s d of 0.8, 0.5, and 0.2, respectively, representing effect sizes that are “large,” “medium,” and “small.” In general, the therapeutic effects of AL vs placebo from this post hoc analysis resulted in single-digit NNT values. Effect size differences from placebo generally became weaker as the threshold for response was increased from ≥20% to ≥50%. A meta-analysis of studies of other atypical antipsychotics reported an overall mean response rate for long-acting injectables of 47% vs 24% for placebo (NNT = 4) based on a ≥20% improvement in PANSS total score;11 thus, AL 441 and 882 mg were comparable with respect to this outcome, with an observed NNT of 4 (95% CI: 3–6). Similar analyses have been published regarding other antipsychotics, including paliperidone palmitate,12,13 iloperidone,14 cariprazine,15 lurasidone,16 and pimavanserin.17

Individual AE outcome differences for pooled AL dose groups vs placebo generated NNH values in the double digits and produced LHH values that were consistently favorable. Clinically, additional considerations include the time to onset of the AE vs time to onset of a therapeutic response, as well as the severity and duration of the AE. The AE in question may be easily manageable if it is short-lived and not serious. For akathisia, median durations were 13, 15, and 22 days for the placebo, AL 441 mg, and AL 882 mg groups, respectively. Two patients discontinued because of an AE of akathisia, and the majority (>75%) of akathisia events had an onset before the second injection.1,18 No cases of akathisia occurred in the AL 882 mg group beyond 1 month after the first injection.1

In contrast, the comparison of pooled AL dose groups vs placebo produced negative NNH estimates for several individual AEs, suggesting that AL was advantageous over placebo for those outcomes. Three were statistically significant, including AEs coded as schizophrenia (NNH = −16), agitation (NNH = −26), and psychotic disorder (NNH = −30). This makes intuitive sense, as AL would be expected to treat the symptoms of schizophrenia. Moreover, in a previous analysis, AL demonstrated decreases in ratings of agitation and hostility, with the antihostility effect being independent of the general antipsychotic effect.19

A limitation of this analysis is that given the rigid patient selection criteria of a clinical trial, results from this pooled analysis of a 12-week trial may not be generalizable to all patients seen in clinical practice. In addition, the trial was conducted using a fixed-dose design, in contrast to clinical practice, where dose adjustments are generally made based on individual symptom relief and/or the development of tolerability issues. Furthermore, a major limitation of NNT and NNH analyses is that these metrics are limited to dichotomous outcomes. Other effect size measures are necessary when describing continuous outcomes, such as mean changes in PANSS scores or mean changes in fasting plasma glucose levels.20 Cohen’s d addresses this limitation and provides a way to assess effect size of drug-placebo differences for continuous variables.


These descriptive analyses of supportive endpoints using NNT and NNH provide more meaningful clinical information that is simpler to interpret than absolute point changes on clinical rating scales, and they are more intuitive than a comparison of clinically relevant outcomes for AL vs placebo measured via continuous variables (e.g., PANSS total score). The magnitude of the NNT effect sizes for PANSS response paralleled those observed using Cohen’s d when examining change in PANSS total score. Using the metrics of NNT and NNH, AL 441 and 882 mg administered every 4 weeks were efficacious and well-tolerated interventions for the treatment of patients experiencing an acute exacerbation of schizophrenia.

Data sharing statement

The data collected in this study are proprietary to Alkermes, Inc. Alkermes, Inc. is committed to public sharing of data in accordance with applicable regulations and laws.


This study was sponsored by Alkermes, Inc. Medical writing and editorial support was provided by John H. Simmons, MD, of Peloton Advantage, LLC, an OPEN Health company (Parsippany, NJ), and funded by Alkermes, Inc. The study sponsor was involved in the design, collection, and analysis of the data. Interpretation of the results was by the authors, and the decision to submit the manuscript for publication was made by the authors.

Author contributions

All authors contributed to data analysis, drafting and revising the article, gave final approval of the version to be published, and agree to be accountable for all aspects of the work.


Dr Leslie Citrome reports personal fees from Alkermes, during the conduct of the study. He reports personal fees as a consultant from Acadia, Alkermes, Allergan, Indivior, Intra-Cellular Therapeutics, Janssen, Lundbeck, Merck, Neurocrine, Noven, Osmotica, Otsuka, Pfizer, Shire, Sunovion, Takeda, Teva, Vanda. He also reports personal fees as a speaker from Acadia, Alkermes, Allergan, Janssen, Lundbeck, Merck, Neurocrine, Otsuka, Pfizer, Shire, Sunovion, Takeda, Teva, and Vanda. Dr Citrome owns stocks from Bristol-Myers Squibb, Eli Lilly, Johnson & Johnson, Merck, and Pfizer. He also reports royalties from Wiley (Editor-in-Chief, International Journal of Clinical Practice), UpToDate (reviewer), Springer Healthcare (book), outside the submitted work. Dr Yangchun Du and Peter J Weiden are employees of Alkermes, Inc. The authors report no other conflicts of interest in this work.


1. Meltzer HY, Risinger R, Nasrallah HA, et al. A randomized, double-blind, placebo-controlled trial of aripiprazole lauroxil in acute exacerbation of schizophrenia. J Clin Psychiatry. 2015;76(8):1085–1090. doi:10.4088/JCP.14m09741

2. Citrome L, Risinger R, Cutler AJ, et al. Effect of aripiprazole lauroxil in patients with acute schizophrenia as assessed by the positive and negative syndrome scale-supportive analyses from a Phase 3 study. CNS Spectr. 2018;23(4):284–290. doi:10.1017/S1092852917000396

3. Nasrallah HA, Newcomer JW, Risinger R, et al. Effect of aripiprazole lauroxil on metabolic and endocrine profiles and related safety considerations among patients with acute schizophrenia. J Clin Psychiatry. 2016;77(11):1519–1525. doi:10.4088/JCP.15m10467

4. Aristada [package Insert]. Waltham, MA: Alkermes, Inc.; 2018.

5. Citrome L, Ketter TA. When does a difference make a difference? Interpretation of number needed to treat, number needed to harm, and likelihood to be helped or harmed. Int J Clin Pract. 2013;67(5):407–411. doi:10.1111/ijcp.12142

6. Citrome L, Kantrowitz J. Antipsychotics for the treatment of schizophrenia: likelihood to be helped or harmed, understanding proximal and distal benefits and risks. Expert Rev Neurother. 2008;8(7):1079–1091. doi:10.1586/14737175.8.7.1079

7. Straus SE. Individualizing treatment decisions. The likelihood of being helped or harmed. Eval Health Prof. 2002;25(2):210–224. doi:10.1177/016327870202500206

8. Citrome L, Magnusson K. Paging Dr Cohen, Paging Dr Cohen…An effect size interpretation is required STAT!: visualising effect size and an interview with Kristoffer Magnusson. Int J Clin Pract. 2014;68(5):533–534. doi:10.1111/ijcp.12435

9. Citrome L. Quantifying risk: the role of absolute and relative measures in interpreting risk of adverse reactions from product labels of antipsychotic medications. Curr Drug Saf. 2009;4(3):229–237.

10. Kraemer HC, Kupfer DJ. Size of treatment effects and their importance to clinical research and practice. Biol Psychiatry. 2006;59(11):990–996. doi:10.1016/j.biopsych.2005.09.014

11. Fusar-Poli P, Kempton MJ, Rosenheck RA. Efficacy and safety of second-generation long-acting injections in schizophrenia: a meta-analysis of randomized-controlled trials. Int Clin Psychopharmacol. 2013;28(2):57–66. doi:10.1097/YIC.0b013e32835b091f

12. Citrome L. Paliperidone palmitate - review of the efficacy, safety and cost of a new second-generation depot antipsychotic medication. Int J Clin Pract. 2010;64(2):216–239. doi:10.1111/j.1742-1241.2009.02240.x

13. Mathews M, Gopal S, Nuamah I, et al. Clinical relevance of paliperidone palmitate 3-monthly in treating schizophrenia. Neuropsychiatr Dis Treat. 2019;15:1365–1379. doi:10.2147/NDT.S197225

14. Citrome L. Iloperidone for schizophrenia: a review of the efficacy and safety profile for this newly commercialised second-generation antipsychotic. Int J Clin Pract. 2009;63(8):1237–1248. doi:10.1111/j.1742-1241.2009.02142.x

15. Citrome L. Cariprazine for the treatment of schizophrenia: a review of this dopamine D3-preferring D3/D2 receptor partial agonist. Clin Schizophr Relat Psychoses. 2016;10(2):109–119. doi:10.3371/1935-1232-10.2.109

16. Citrome L. Schizophrenia relapse, patient considerations, and potential role of lurasidone. Patient Prefer Adherence. 2016;10:1529–1537. doi:10.2147/PPA.S45401

17. Citrome L, Norton JC, Chi-Burris K, Demos G. Pimavanserin for the treatment of Parkinson’s disease psychosis: number needed to treat, number needed to harm, and likelihood to be helped or harmed. CNS Spectr. 2018;23(3):228–238. doi:10.1017/S1092852917000736

18. United States Food and Drug Administration. Aristada (aripiprazole lauroxil) drug approval package; 2015. Available from: Accessed August 21, 2019.

19. Citrome L, Du Y, Risinger R, et al. Effect of aripiprazole lauroxil on agitation and hostility in patients with schizophrenia. Int Clin Psychopharmacol. 2016;31(2):69–75. doi:10.1097/YIC.0000000000000106

20. Citrome L. Relative vs. absolute measures of benefit and risk: what’s the difference? Acta Psychiatr Scand. 2010;121(2):94–102. doi:10.1111/j.1600-0447.2009.01449.x

Creative Commons License This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at and incorporate the Creative Commons Attribution - Non Commercial (unported, v3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]