Alpha Allocation in Adaptive Clinical Trials: Misconceptions and Scientific Consequences
Technical Foundations: Defining Alpha
In phase 3, pivotal, or adequate and well-controlled superiority trials, the role of type 1 error is standard—an allocation of a one-sided 2.55% defines these trials. This provides a standard threshold or allocation for a trial having a 2.5% chance of concluding the treatment is superior, when it is equal to the control arm. We refer to the type 1 error threshold, which we typically compare a p-value to, by the Greek letter, alpha. A standard fixed sample size trial evaluates the primary analysis, conducting a hypothesis test at one time point. The structure is simple: if the calculated p-value falls below the alpha threshold, 0.025, superiority is claimed. The probability of this occurring under the null hypothesis that the treatment and placebo are in fact equal, is the defined type I error, 2.5%.
When trials employ adaptive sample size approaches, with multiple time points evaluating superiority, the alpha-level at each time point is adjusted, so that the probability of concluding superiority at any time point in the trial is limited to 2.5%. To provide a comprehensive trial level 2.5% type 1 error, each individual test uses a nominal alpha-level that is adjusted. This adjustment across the multiple time points is commonly referred to as the alpha-spending function.
A source of widespread confusion is the entrenched belief that introducing interim analyses “costs” alpha; that is, the assumption that interim adaptations erode the available alpha and require the sponsor to “pay a penalty.” This notion also leads to the myth that just the action of “looking at data” at an interim analysis is bad and costs alpha. Scott Berry, in this week's episode of "In the interim…", is explicit in rejecting this characterization: “You haven’t lost anything.” The phrase “penalty” predominates in the literature and in industry dialogue, but, as Berry argues, it is semantically pejorative and functionally misleading.
Interim Analyses and Alpha: The Statistical Reality
Group sequential adaptive sample size designs often introduce interim analyses at pre-specified enrollment counts, such as after 200 and 300 patients in a planned maximum sample size 400-patient trial. These provide opportunities to assess superiority earlier than the maximum sample size. If superiority analyses are specified at these interim analyses, the technical approach utilizes so-called spending functions or boundaries, which control the overall type I error by assigning explicit, lower, nominal alpha values to each interim analysis.
In the above illustration, using O’Brien-Fleming group sequential boundaries, the nominal alpha-level at each analysis time point would be:
● Interim at 200 patients; nominal alpha = 0.0031
● Interim at 300 patients; nominal alpha = 0.0092
● Final at 400 patients; nominal alpha = 0.0213
These nominal values are each less than 0.025; they represent the threshold at each look for stopping. The overall type I error for the trial remains at 2.5%. The alpha-level at the final analysis is 0.0213, and less than 0.025, which would be the threshold with no interim analyses. This smaller nominal alpha at the 400-patient analysis leads to the misconception that the final analysis is “penalized”; Berry states, “You’ve just allocated it over the three analyses. You haven’t lost anything.” The act of spreading alpha over multiple analyses does not diminishes the overall error rate allowance nor requires any form of compensation beyond careful allocation.
Historically group sequential designs like the one presented here were the only types of adaptive designs. In these adaptive designs, superiority actions were specified at each of the analyses. So, historically each interim analysis conducted required allocation. This leads to the misconception that doing an interim analysis – looking at data – causes the need for alpha allocation. This further leads to the idea that any type of interim – even if it doesn’t have a superiority analysis, causes alpha allocation. The natural conclusion from this is that looking at data needs to be penalized and is bad. The adherence to the “penalty” vernacular has stifled efficient adaptive clinical trial designs.
Consequence for Power and Sample Size: Quantitative Examples
Consider the example above from the podcast:
● A fixed sample 400-patient trial (single final look) is powered at 85% to detect an effect size of 0.3.
● A group sequential design with interim analyses (using the boundaries above) reduces power from 85% to 84% but reduces the mean sample size from 400 to 313.
This small power reduction is not the result of alpha reduction from interim analyses per se, but from distributing decision-making across smaller, potentially less informative sample sizes earlier in the accrual process. Berry clarifies, “You haven’t lost any alpha. You don’t lose power because you pay a penalty in alpha. It’s because you distributed some of that alpha to smaller sample sizes.”
Expanding the design above to allow for up to 500 patients, with appropriate reallocation of alpha, can have positive effects. Taking O’Brien-Fleming boundaries at 200, 300, 400, and a final potential sample size of 500 leads to:
● Power increases from 85% to 91%
● The average sample size is 368, still below the fixed sample size of 400 for the original fixed design with sample size 400
Berry points out that this flexibility results in both higher power and a reduction in the average number of enrolled patients; “By allowing flexibility, my average sample size is smaller than 400. My power is greater, going from 85% to 91%.” These are the operational gains of adaptive design—the “penalty” is a misapplied term. The fear of spending alpha may push one to run the fixed sample size 400-patient trial, when the ability to distribute alpha can create more powerful, more efficient trial designs.
Action versus Observation: When is Alpha Adjustment Required?
Simply viewing interim data does not consume alpha or demand adjustment. It is not inspection of the data, but potential trial actions at each look that may require type I error adjustment. Futility analyses conducted at interim timepoints—where there is no possibility of early declaration of superiority—do not require alpha adjustment. Berry states, “You could do 100 interims for futility in your trial and your final analysis used an alpha of 0.025... because no action we took during the trial increased the probability of making a type I error.”
Berry discusses the SEPSIS-ACT trial as an example; in a design with more than 20 planned interim analyses, with no superiority stops, the trial maintained a final nominal alpha of 0.025 and was approved under a Special Protocol Assessment after strict regulatory scrutiny. At the interim analyses response adaptive randomization and a potential shift to phase 3 could occur – and these interims required no alpha adjustment.
Adjustment becomes necessary when interim analyses create pathways that can increase type 1 error probabilities. As an example, making a dose-selection in a seamless 2/3 trial, and carrying that data through to the end of the phase 3 portion, requires alpha-adjustment. As an example, suppose a phase 2 trial enrolls 30:30:30 to placebo and two experimental doses. The best dose is selected, and the trial enrolls 90 on the selected dose and placebo, and the phase 2 data (30 on experiment and 30 on placebo) are included in the phase 3 analysis. In such a case, as detailed in the podcast, if data from a selected phase II dose and placebo are carried forward and pooled with subsequent phase III accrual, the final alpha is adjusted downward (e.g., to 0.01693) to maintain proper type I error control. This is not bad – the power is higher by including the phase 2 data and adjusting alpha than the stand alone 90-versus-90 portion of the trial using an unadjusted 0.025. Again, it’s an example where allocating alpha for inclusion of phase 2 data improves power. It’s a good thing to do, certainly not a penalty.
Conclusion
Alpha allocation in adaptive designs is not a penalty; as Berry emphasizes, “You haven’t lost anything.” Observing data at interim analyses, without the possibility of making a claim for superiority, may not consume or require adjustment of alpha. Efficient trial designs benefit from pre-specified adaptive designs, utilizing interim analyses—not avoidance—of interim analysis. Embracing well designed, pre-specified, adaptive designs, with careful allocation of the allotted alpha is essential to advancing efficient clinical trial designs.