Blog

July 11, 2025

No items found.

ICH E20 Reactions: Group Sequential Designs

No items found.

With the ICH E20 draft guidance now released and entering the public comment stage, Dr. Kert Viele of Berry Consultants begins a new blog series on this topic. Dr. Viele provides explainers for the designs and principles under discussion, and some initial reactions to the draft ICH E20 related to Group Sequential Designs in this blog.

By Kert Viele

Note for this series – The ICH E20 draft guidance on adaptive designs is out. Its mere existence is evidence of the growth of adaptive designs in the past decades, with patients and sponsors benefitting from efficient designs that answer modern research questions. As a guidance for sponsors submitting adaptive designs to regulators, one of its main purposes is to identify potential points of contention, preferred paths to pursue or avoid, and other potential problems. It’s not a guidance on if or when to choose an adaptive design, nor an explainer.

In this series I want to focus on several different adaptive designs and our experience in the how and why sponsors choose (or don’t choose!) an adaptive design over a non-adaptive design, explain some of the issues discussed in the ICH E20 draft in more detail, and provide my quick reactions to the draft. In many (most?) I agree with much of the draft. In other places I may disagree a bit but that disagreement is down in the weeds. And in a couple places, I think the draft could be greatly improved.

I’ve tried to maintain a common format to these entries, focusing on the motivation for the adaptive design, a simple case study to show how the design is implemented, quantification of the benefits of the design, a discussion of the risks, and a discussion of the corresponding ICH E20 draft text. I’ve tried to italicize the main points in the explainer, and place the ICH E20 comments together at the end.

Today’s blog, 1st of the series, is on Group Sequential Designs.

‍Why use a group sequential?

Group sequential designs (GSDs) are among the oldest adaptive designs. They are motivated by two related ideas.

First, during design we have limited information the treatment effect, often pilot or phase 2 data with considerable uncertainty. Suppose your phase 2 was promising, with a meaningful 5 point effect on some scale and a confidence interval of -1 to 11. You determine that 700 patients will obtain 90% power for 5-point effect. The effect isn’t known to be 5. Suppose 3 is marketable, and 7 is also possible. While a 5-point effect needs 700 patients, a 3-point effect would need almost 1950, and a 7-point effect would only need 350. These are massive differences, all for plausible effects given your past data. If you choose a nonadaptive design with a fixed sample size, you are at large risk of being either underpowered or overpowered, either missing a meaningful effect or expending vastly more resources and time than necessary.

Second, when we design a trial for 90% power, we are buying an expensive insurance policy against bad luck. Suppose we run a 90% powered trial and observe exactly the effect used for powering. What is the p-value? It’s not a one sided 0.025, it’s a one sided 0.0006. Why so small? When we achieve 90% power, a lot of that 90% is there to account for the times our observed effects are smaller than the truth. At least half the time, our effect is not smaller than the truth. We often don’t need that large a sample to obtain convincing evidence.

Together, these suggest using flexible sample sizes, with interim analyses allowing the trial to stop early when the research question has been answered.

What does this look like in practice?

Suppose we are running a trial with a dichotomous endpoint (responses are good). We believe the control rate is about 30% and hope to increase that to 50% with our new therapy. If we were to run an N=200 study (100 per arm), we would achieve about 83% power.

With a group sequential, we could place interim analyses at N=100, 120, 140, 160, 180, 200 with a final analysis at N=220. Some immediate questions. Why did the maximal sample size go up? Generally, group sequential designs with the same maximal sample size have slightly less power than a fixed trial with the same maximum. We can either accept the slight power loss (1-3% often) or slightly increase the maximum N, as here. Why so many interims? They indeed may not be needed. Generally, more interims is better statistically, but there are diminishing statistical returns for each additional interim, and operational costs. In practice we design trials with many interims and later see if we can remove them with minimal statistical cost. We may find that 100, 140, 180, 220 does almost as well (or 100, 160, 220, etc.). Final designs often have 2-5 interims, but there is no required minimum or maximum. It’s an old myth “you should have two interims”. Maybe that is the right answer, maybe not. Pick the right number of interims for you.

At each interim analysis, we see if the data is sufficiently compelling to stop, or if the trial should continue. The table below shows the p-value required (you could also be Bayesian and use posterior probabilities) for success at each interim. We have chosen O’Brien Fleming bounds for success, which are a common choice that maintain most of the power of a fixed trial while providing significant sample size savings. Importantly, these bounds account for the multiplicity of looking at the data multiple times. Even though we are looking at the data repeatedly, our chance of making a type 1 error, in total, is still a one sided 2.5% (these values need not add to 2.5% due to the correlation between interims).

We do not recommend success stopping alone. Many investigational therapies don’t work, and stopping for futility is, in my opinion, the single most important adaptation, allowing patients and resources to be reallocated to more promising investigations. For futility, we use the predictive probability of success at each interim (frequentist methods are also available). Using Bayesian methods, we compute the predictive probability that our trial will be successful in the future, given our current data. If that predictive probability falls below 5%, we will stop the trial for futility. The 5% is a sponsor choice. More aggressive futility is better for stopping “bad” therapies, but it can mistakenly catch some good therapies with unlucky early data, reducing power. Sponsors need to find a balance that works for them (often 1-10%).

To conduct the trial, at each interim analysis we have a third party, separate from the sponsor to maintain blinding and operational secrecy, look at the data to determine if the trial should stop for success or futility. This result is typically reviewed by a DSMB and the sponsor only receives the recommendation “continue” or “stop”. This continues until the trial stops or reaches its maximal sample size.

Quantifying the benefits

As with any trial, we can compute the trial’s power (the probability of declaring efficacy if the therapy works) and type 1 error (the probability of declaring efficacy if the therapy is a null). In an adaptive design the sample size is also random, so we can compute the probabilities the trial will stop for success or futility at any given sample size, and the expected (average) sample size of the trial.

The table below gives the probability of success and futility at each interim, for the rules described above, when the therapy is a null (control rate 30%, therapy rate 30%). For example, in the table we see there is a 0.3% chance the trial stops for success at the N=100 interim, while there is a 58.9% chance the trial stops for futility at the N=100 interim.

When the therapy is a null, the type 1 error is controlled under 2.5%, and almost 80% of trials stop at one of the first three interims (N=100, 120, 140). The average sample size of the trial is 123 (no trial stops at exactly 123, this is an average of the times the trial might stop at 100, 120, 140, etc.). While the maximal sample size is N=220 (as opposed to N=200 for our fixed trial), we only get to the maximal sample 3.6% of the time. In repeated use, for null therapies this GSD will save almost 40% of resources compared to repeatedly running a fixed nonadaptive trial.

The next table gives the probability of success and futility at each interim when the therapy is effective (control rate 30%, therapy rate 50%).

The overall power of the trial is 83.2%, slightly higher but essentially identical to the fixed trial. The trial has a meaningful probability of stopping at each interim analysis, but only a 10.6% of needing the maximal N=220 sample size. The average sample size in the alternative hypothesis is just under 150 patients. This GSD saves the sponsor 25% of the patients with repeated use on effective therapies. Note that we stop early for futility about 11% of the time, even in this scenario where the therapy is effective. Most of these stops, but not all, reflect situations where the trial would not ultimately be successful (e.g. the question is not whether the trial is successful or not, but whether the trial is unsuccessful at a smaller sample size or unsuccessful at a larger one).

In total, the group sequential design provides the same power and type 1 error as the nonadaptive trial. While the adaptive trial has a small probability of requiring the larger N=220 sample sizes, repeated use of group sequential designs will save significant resources over nonadaptive trials. With savings of 25-40%, imagine a granting agency currently funding nonadaptive trials. Switching to group sequential trials, when appropriate, would allow the granting agency to fund 1 additional trial for every 2 they currently funded (NNF = Number Needed to Fund??).

Note if the effect is larger than anticipated, or the therapy is harmful, the trial will stop even earlier. This behavior addresses both the motivations above. If our treatment effect is uncertain, we can design our trial with a large maximal sample size but can be confident that the trial will stop early if a larger effect is present, or simply if we avoid bad luck with our anticipated effect. Similarly, futility stopping allows us to stop null or harmful drugs quickly, allowing patients and sponsor resources to be allocated to more promising therapies.

What are the risks?

The general concerns about group sequential designs center on the lesser information acquired for other, non-primary, endpoints, and the accuracy of estimation of treatment effects.

Group sequential designs typically have lower sample sizes than comparable fixed trials. In the above example, it could be that data at N=100 provides sufficient evidence to conclude efficacy but is insufficient to establish safety. In this case it may be necessary to place the first interim analysis later to mitigate this risk, or perhaps also include safety in the interim analysis rules, requiring demonstrations of both efficacy and safety to stop early. Similarly, if certain subgroups need to be powered, that should be considered as well. We note that often safety and subgroups are underpowered even in phase 3 trials (rare safety events often require post marketing surveillance). At N=150 (the average sample size for the alternative scenario above), confidence intervals are only 15% larger than confidence intervals at N=200, so care needs to be taken that going to the larger sample size truly provides information that will benefit patients. For example, is knowing the increased risk of a therapy is X% plus or minus 5% meaningfully better than knowing the increased risk of a therapy is X% plus or minus 5.8%? These issues should always be considered, they should be quantified and balanced against the benefits of the design.

There are also concerns about biased point estimate arising from group sequential trials. Note that bias is complex to define for group sequential trials. We typically refer to bias as the difference between the average value of an estimator compared to its true value, with the average taken over the full distribution of the estimator. In group sequential designs, we may be interested in a narrower question such as “given the trial stops for success at the first interim, what is the bias of the estimator”. This question is clearly relevant but also represents some severe cherry picking. Early interims for a group sequential require very strong results (in the example we needed p<0.0023 at the first interim). For mediocre true effects, these strong observed results are indeed biased upward when they occur, but they also do not occur very often.

There are frequentist methods for adjusting group sequential designs for bias. We tend to explore these designs from a Bayesian perspective, in which case the correct point estimate comes from the posterior distribution (which typically shrinks extreme estimates naturally). We also recommend that care should be taken in the placement of the first interim. If the resulting design will only stop at implausible values of the parameter, we recommend delaying early stopping until that is no longer true (early futility may still be applicable).

From an operational perspective, interim analyses require third parties to be available to conduct them, and sufficient expertise to maintain secrecy of the results while the trial is ongoing. We recommend working with partners and DSMB members with experience in group sequential designs.

Finally, group sequentials may be inappropriate for long delayed endpoints. With delayed endpoints, we may have information on few patients at the interim analyses, lowering or eliminating the benefits of the group sequential. There are methods for handling delayed endpoint, both within a frequentist paradigm and from a Bayesian perspective (often called “Goldilocks trials”). Regardless of your inferential paradigm, these methods can benefit if early endpoints are observed for each patient that are predictive of final endpoints.

Summary

Group sequential designs mitigate risks related to pretrial uncertainty in the treatment effect and can minimize the required number of patients in a trial by stopping the trial when the research question is answered. This is accomplished through periodic interim analyses, with thresholds chosen to maintain type 1 error control, ideally combined with good futility rules. Compared to fixed, nonadaptive trials, group sequential designs can save 20-40% of the required sample size of a trial while achieving equivalent power and type 1 error. Care must be taken to provide adequate power for safety and secondary questions, to accurately estimate treatment effects and other parameters, and to maintain operational integrity of the trial.

Quick reactions to ICH E20 draft

Group sequential designs are covered in section 4.1 of the draft. Generally, we agree with the text.

The discussion of bias appears to be focused entirely on frequentist methods. Bayesian methods estimate parameters based on the posterior distribution, and the text would be improved to discuss this option. Bayesians should take care to avoid situations with early interims where large “prior/data conflict” can occur. If an effect of 2 or more points on a scale is unlikely, having an interim that only stops for effects of 5 or more will generate more “prior driven” rather than “data driven” conclusions. In my opinion this is to be avoided by only including interims that stop at plausible values of the treatment effect.

On a particularly technical note, the text suggests employing non-binding futility rules, which sponsors need not follow during trial implementation. This contrasts with binding futility, rules that must be followed and that may be used in the calculation of type 1 error for the trial (thus allowing slightly less stringent success bounds). The text notes that the flexibility of non-binding rules “…is important because decision-making about whether to stop for futility or continue is usually not an algorithmic process and may need to incorporate additional information beyond the primary efficacy endpoint, such as safety or other efficacy data”. It is unclear to us how a sponsor could actually utilize this flexibility without obtaining information that would run afoul of the operational integrity principles. Can the sponsor (presumably a firewalled subset of the sponsor) view unblinded safety or other efficacy data?

‍

Download PDF

View

Back to All Blogs

Other Blogs

View All Blogs

A Bayesian Framework for Modern Trial Design

Bayesian statistics enables efficient, inclusive, and compliant clinical trial designs by rigorously updating evidence, supporting adaptation, and enabling comprehensive analysis across complex data landscapes.

August 9, 2025

The Role of the Time Machine in Adaptive Platform Trials

The “time machine” enables rigorous, unbiased comparisons in adaptive platform trials by modeling era effects and overlapping treatments, improving resource allocation while ensuring accurate estimation.

August 1, 2025

ICH E20 Reactions: Group Sequential Designs

July 11, 2025

Goldilocks Designs – If Bayesians had conceived Group Sequential Designs

A Goldilocks trial design is an adaptive clinical trial methodology developed to optimize the sample size dynamically during the course of a trial. Its name references the "just right" principle from the Goldilocks fairy tale—neither too large nor too small. Goldilocks designs seek balance between flexibility and efficiency.

June 16, 2025

Alpha Allocation in Adaptive Clinical Trials: Misconceptions and Scientific Consequences

A source of widespread confusion is the entrenched belief that introducing interim analyses “costs” alpha; that is, the assumption that interim adaptations erode the available alpha and require the sponsor to “pay a penalty.” This notion also leads to the myth that just the action of “looking at data” at an interim analysis is bad and costs alpha.

June 13, 2025

Regression-to-the-Mean: Insights for Drug Developers from the Sports World

Regression-to-the-mean offers critical insights for drug development, like sports statistics, emphasizing the importance of understanding data variability across multiple levels of inference.

May 30, 2025

Navigating the Complex Role of DSMBs in Adaptive Clinical Trials

Understanding DSMBs is pivotal in ensuring trial integrity, safety, and success.

May 23, 2025

Implementing Adaptive Trials: A Comprehensive Exploration

A discussion on the intricacies of implementing adaptive clinical trials, their operational processes, and how Berry ensures timely execution.

May 9, 2025

Revisiting a Seamless 2/3 Trial: The Amazing Journey of a GLP-1 Agonist

Explore the intricacies of the AWARD-5 trial for Eli Lilly's dulaglutide, from complex trial design to the transformation of pharmaceutical development timelines.

May 2, 2025

The Role of Innovative Trial Designs in Transforming Clinical Research

Explore how adaptive platform trials like I-SPY2 and GBM AGILE are revolutionizing clinical research to accelerate drug development.

April 25, 2025

Longitudinal Modeling in Clinical Trial Design: Methodological Advantages & Challenges

Longitudinal models can improve the efficiency in clinical trial decision making. So are we taking full advantage of this opportunity? This blog discusses methodological advantages and challenges.

April 21, 2025

Integrating External Data in Clinical Trials

Exploring the use of external data in clinical trials and its implications for clinical trial design and analyses.

April 18, 2025

The Art and Slog of Innovation in Clinical Trials

Innovation in drug development requires perseverance, strategic thinking, and a shift in traditional practices to innovate clinical trials and improve drug development.

April 4, 2025

Navigating Controversy: Ordinal Outcomes in Clinical Trials

We explore the history and modern complexities of ordinal outcomes in clinical trials, discussing their importance and the contentious debates surrounding their analysis.

March 28, 2025

The HEALEY ALS Platform Trial: Revolutionizing Clinical Trials

An in-depth exploration of the HEALEY ALS Platform Trial's innovative design and impact on clinical trials.

March 21, 2025

New Release of FACTS Enhancing Trial Simulation

Discover how the latest release of FACTS enhances clinical trial simulations with greater complexity, flexibility, and usability.

March 14, 2025

When Should You Use Adaptive Design Clinical Trials?

Adaptive design clinical trials offer flexibility, efficiency, and improved outcomes in medical research, but when should you explore their usage?

March 12, 2025

Precision Promise Adaptive Platform Trial update

The Precision Promise platform trial is an adaptive study exploring multiple potential therapies for pancreatic cancer, with pamrevlumab recently advancing to the next stage after demonstrating a predictive probability of at least 35% for improved overall survival, highlighting the trial's innovative approach to efficiently identify effective treatments in a field with limited options.

January 26, 2024

Comments on the draft FDA master protocol guidance

Kert Viele's blog discusses the FDA's draft guidance on master protocols for drug and biological product development, highlighting key sections on trial design, randomization, control groups, informed consent, and regulatory considerations, while encouraging feedback from experts to enhance the guidance's effectiveness before the comment deadline of February 22.

January 11, 2024

Is early stopping biased? Maybe, maybe not….

The blog by Kert Viele discusses the potential biases in clinical trial results, particularly focusing on early stopping trials and the implications of only publishing successful outcomes, emphasizing that while biases exist, their significance varies based on the true response rate and the context of the trial.

November 7, 2023

Time Trends in Clinical Trials (related to 2023 ASA Biopharm panel)

On September 28, 2023, Kert Viele will moderate a panel at the ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop, discussing time trends in ongoing platform trials using real interim analyses from the PRINCIPLE and REMAP-CAP trials, focusing on their impact on clinical trial analyses, modeling adjustments, and the complexities of additive versus interactive time trends.

September 27, 2023

Prior Practicum: Interpretable Priors for CRM Designs

Joe Marion's blog discusses the challenges of designing phase I dose-finding studies in oncology using the Continual Reassessment Method (CRM) and Bayesian approaches, emphasizing the importance of selecting appropriate prior distributions to balance patient safety and effective dose escalation, while suggesting that re-parameterizing models can simplify the design process.

October 14, 2023

If Bayesian inference doesn’t depend on the experimental design, then why does “Bayesian optimal design” exist?

In his blog, Kert Viele discusses the importance of trial design in Bayesian analysis, emphasizing that while conclusions drawn from completed experiments remain consistent regardless of interim analyses, the design of the trial significantly impacts expected utilities, and optimal designs can enhance trial performance.

September 14, 2023

The use of synthetic or external data in clinical trials

The blog by Kert Viele discusses the tradeoffs of using external or synthetic data in clinical trials, highlighting how aggressive use can save patient resources but risks scientific robustness, and emphasizes the importance of understanding the agreement between synthetic and actual trial data to optimize inferential performance while minimizing patient enrollment.

August 28, 2023

HOW TO GET CONTROL? CONCURRENT VS CONTEMPORARY VS HISTORICAL VS SYNTHETIC CONTROLS

The discussion highlights the growing role of real-world evidence in clinical trials, particularly as a potential substitute for control arms, while emphasizing the need to address biases associated with various control methods and advocating for a future dominated by platform trials that balance cost savings with reduced bias risks.

February 27, 2019

WHEN SHOULD YOU BORROW HISTORICAL DATA (OR REAL-WORLD EVIDENCE)?

Kert Viele discusses the concept of historical borrowing in clinical trials, highlighting its potential benefits and risks, particularly in relation to FDA guidance and the importance of assessing "drift" to determine when it is appropriate to utilize historical control data for improving trial efficiency and accuracy.

November 8, 2019

IMPROVING PROGRAM RESULTS THROUGH BETTER PHASE 1 AND 2 TRIALS

Kert Viele discusses the challenges and probabilities of success in a drug development program, highlighting that a standard approach often leads to a high rate of failure due to poor dose selection in early trials, but suggests that a revised strategy of continuous patient allocation and dose escalation can significantly improve the chances of successfully bringing an effective therapy to market.

November 15, 2019

HYPOTHESIS TESTING, CLINICALLY IMPORTANT EFFECTS, AND DO WE PAY TOO MUCH FOR CLINICAL TRIAL INSURANCE?

Highly powered clinical trials are costly and often yield statistically significant but clinically meaningless results due to large sample sizes designed to mitigate random errors, suggesting the need for alternative approaches like flexible sample sizes and group sequential designs to optimize resource use and improve trial efficiency.

December 7, 2019

DESIGNING A COLLECTION OF TRIALS

The article emphasizes the importance of optimizing clinical trial designs by investigating multiple therapies simultaneously and utilizing strategies like Bayesian thinking and platform trials to significantly reduce the time and resources needed to identify effective treatments for difficult medical conditions.

January 10, 2020

Some Intuition Behind Hierarchical Modeling

Hierarchical modeling is an advanced statistical approach used in clinical trials to make inferences across multiple patient groups, enhancing power and reducing sample sizes while requiring careful implementation to account for variability and potential biases in observed data.

November 13, 2017

Should I use a Bayesian trial?

This week, we published an article in JAMA titled “Bayesian Analysis: Use of Prior Information in Clinical Trials,” which explores the nuances of Bayesian analysis in clinical trials, emphasizing the importance of transparency and community consensus when using informative priors to avoid bias and enhance trial efficiency.

October 27, 2017

Todd Graves: let me introduce myself

Todd Graves, who joined Berry Consultants in January 2012, plans to regularly blog about innovative clinical trial designs and his statistical modeling for college football team ratings, sharing insights and updates on both topics.

September 5, 2013

Jason Connor's Upcoming Events

Jason Connor will be participating in several upcoming events, including presentations on Bayesian adaptive trials and findings at various conferences and teaching a class at Johns Hopkins School of Public Health from May 31 to June 28.

May 24, 2013

ASA's new section for Medical Devices and Diagnostics

SIGMEDD, a statistics interest group within the American Statistical Association focused on medical devices and diagnostics, is seeking ASA members' support through signatures for its transition to a full section, with the current Chair-Elect encouraging participation via a petition.

October 17, 2012

SIGMEDD - Statistical Interest Group

As the Chair of the Statistical Interest Group in Medical Devices and Diagnostics (SIGMEDD), I am seeking support from 100 ASA members to transition our group to a Section, and invite interested members to sign our petition via the provided Survey Monkey link.

March 29, 2013

November Webinars and Conferences

Join us for three upcoming events this month focused on clinical trial simulation and adaptive trial design, including two webinars on November 14 and 15, and a conference on November 29-30, where experts will share insights and benefits of these innovative approaches.

November 12, 2012