Blog
June 19, 2026
No items found.

Response Adaptive Randomization

Play video
No items found.
In this week's blog, Dr. Kert Viele discusses his thoughts on the RAR literature and the current state of RAR in practical clinical trials. He focuses on general principles that have emerged in the literature, the nuts and bolts of constructing and tuning RAR designs, and operational issues.

By Kert Viele, Ph.D.

1.0 Introduction

Response Adaptive Randomization refers to a class of adaptive designs that alter the arm allocation probabilities at a series of interim analyses. At each interim, we apply a function to the current data to obtain a new set of allocation probabilities for the next cohort of patients. This function usually increases allocation to the better performing active arms and decreases allocation to poorly performing active arms. These changes may be continuous, resulting in any set of allocation probabilities. An arm dropping design, which may stop arms but not otherwise alter randomization ratios, is a special case of response adaptive randomization allowing allocation to be “on’” or “off”. These methods have been a source of ethical discussion and statistical research dating back to a paper by Thompson in 1933. Substantial research on RAR has occurred in the 21st century, and RAR has been used in multiple small and large clinical trials, including confirmatory trials.

In this blog I discuss my thoughts on the RAR literature (by necessity an incomplete discussion!) and the current state of RAR in practical clinical trials. I will focus on general principles that have emerged in the literature, the nuts and bolts of constructing and tuning RAR designs, and operational issues.

Note this blog is primarily aimed at individuals designing trials for industry and lacks proper attention for materials relevant for a methodology researcher. It also often refers to specific papers for information rather than repeating that information here.

As always, I am happy to respond to questions and comments at: kert@berryconsultants.net

2.0 General Principles

2.1 The RAR literature is contentious!

There is significant controversy regarding RAR, with many papers arguing vociferously for or against the method. On the one hand this reflects the current statistical thinking, with many individuals strongly for the method as a source of statistical efficiency and ethical advantage, and others equally against the method as an overly complex and risky endeavor. On the other hand, the literature may not be as contradictory as it first appears. Response adaptive randomization is a class of methods, and some versions of RAR perform better than others. Often papers that are “for” or “against” RAR make specific arguments about specific variants of RAR in specific contexts, rather than general statements. They may not conflict as much as they first appear. Thus, readers of the RAR literature should be wary of overgeneralized conclusions. Readers should place more weight on papers which provide a nuanced view and adequately discuss all relevant sides.

2.2 What is the context and the scientific question?

In a clinical trial, we often focus on estimating treatment effects, the difference between an active arm and the control. The trial itself may have two arms (treatment and control) or multiple arms (several active arms and a control). In trials with multiple arms, we may be primarily interested in identifying the best arm and comparing it to control, or we may be interested in comparing all arms to the control. The specific context has a large impact on the usefulness of RAR and guides and the choice of the most appropriate RAR variant.

The variance of an estimated treatment effect is generally reduced whenever the sample sizes on the arms being compared are increased. Thus, if we are estimating the difference between an active arm and the control, increasing allocation to that active arm and control will increase our precision. Similarly, decreasing allocation to those arms will reduce precision. RAR attempts to increase precision by increasing these sample sizes.

In trials with a finite maximal sample size, this immediately creates tradeoffs. Any increase in allocation to one arm can only occur through a reduction in allocation to another arm. Thus, RAR will not, in general, benefit all possible comparisons in the trial, but only those comparisons where increases in allocation occur. This immediate focuses our attention to specific contexts, discussed next.

2.2.1 Two arms trials are problematic for RAR

For a two-arm trial (treatment and control), any increase in allocation to the active arm must simultaneously decrease allocation to the control (or vice versa). Generally, this results in decreased precision for the treatment effect. With equal variances across arms, 50-50 allocation is best. This was noted at least as far back as:

Korn EL, Freidlin B. Outcome--adaptive randomization: is it useful? J Clin Oncol. 2011 Feb 20;29(6):771-6. doi: 10.1200/JCO.2010.31.1423. Epub 2010 Dec 20. PMID: 21172882; PMCID: PMC3056658.

Thus, in the two-arm setting, RAR may be associated with reduced power, which is a negative for patients outside the trial (who need accurate inferences) and sponsors. In contrast, patients within the trial may still benefit, even in two arm settings. Some papers on the ethical aspects of RAR include:

Meurer WJ, Lewis RJ, Berry DA. Adaptive clinical trials: a partial remedy for the therapeutic misconception? JAMA. 2012 Jun 13;307(22):2377-8. doi: 10.1001/jama.2012.4174. PMID: 22692168.

London AJ. Learning health systems, clinical equipoise and the ethics of response adaptive randomisation. J Med Ethics. 2018 Jun;44(6):409-415. doi: 10.1136/medethics-2017-104549. Epub 2017 Nov 24. PMID: 29175968.

Hey SP, Kimmelman J. Are outcome-adaptive allocation trials ethical? Clin Trials. 2015 Apr;12(2):102-6. doi: 10.1177/1740774514563583. Epub 2015 Feb 3. PMID: 25649106; PMCID: PMC4482671.

Many, but not certainly all, of the ethical arguments in these papers involve the two arm setting, and the resulting tradeoff between treatment of patients within the trial (who may obtain benefit) versus the treatment of patients outside the trial, who may be harmed either through poorer and/or slower inferences to the general population.

Most of the practical use of RAR in clinical trials has occurred in the multiple active arm setting, avoiding RAR in two arm settings.  As discussed later, in many multiple arm studies RAR can benefit patients both inside and outside the trial, avoiding the conflict. Keep in mind when you read the literature that many of the “con” articles focus entirely on the two-arm setting, making valid points on two arm RAR but nevertheless points that may not generalize to multiple arm RAR.

One interesting ethical argument that has been made against RAR (as I understand it, I am not an ethicist) is the conundrum that arises from knowing arms are doing better and still randomizing patients to the weaker performing arms. Essentially, is it ethical to knowingly assign a patient a likely (but not conclusively) inferior arm. I do understand the discomfort, but two mathematical issues are important to note here. The first is that we simply cannot learn without assigning patients to both arms until a conclusion is made. If we simply assign patients deterministically to the currently better performing arm (even ignoring biases from lack of randomization) we can demonstrate we will make significantly poorer inferences about the treatments under investigation. The second is that, in the multiple arm RAR setting, we can derive that every patient both within and outside the trial has a higher chance of a good result (e.g. surviving if the endpoint is mortality) compared to fixed randomization. Both fixed randomization and RAR assign patients to inferior arms. It feels odd to me that it is more ethical to harm more people in ignorance (fixed randomization) than to harm fewer people knowingly (RAR).

There are some exceptions to the “two arm RAR is worrisome” argument, particularly some interesting recent work by Sofia Villar and colleagues. See for example the comprehensive review article:

Robertson DS, Lee KM, López-Kolkovska BC, Villar SS. Response-adaptive randomization in clinical trials: from myths to practical considerations. Stat Sci. 2023 May;38(2):185-208. doi: 10.1214/22-STS865. PMID: 37324576; PMCID: PMC7614644.

These are very interesting results but, in general, we still avoid RAR in two arm settings at Berry. RAR has operational and regulatory costs, and it’s unclear they are compensated for in two arm settings. We do make some exceptions for situations where the treatment of patients inside the trial becomes more paramount than the treatment of patients outside the trial. An interesting example of this is the PROSPECT trial:

Kneyber MCJ, Cheifetz IM, Asaro LA, Graves TL, Viele K, Natarajan A, Wypij D, Curley MAQ; Pediatric Acute Lung Injury and Sepsis Investigators (PALISI) Network. Protocol for the Prone and Oscillation Pediatric Clinical Trial (PROSpect). Pediatr Crit Care Med. 2024 Sep 1;25(9):e385-e396. doi: 10.1097/PCC.0000000000003541. Epub 2024 May 28. PMID: 38801306; PMCID: PMC11379539.

PROSPECT is a 2x2 factorial design. While in theory this has 4 arms, without a significant interaction the structure of the trial results in “doing 2 arm RAR twice”, once on the rows and once on the columns in the 2x2 table of arms. This choice was made explicitly, with a desire to treat the trial participants, children in respiratory distress, as well as possible while generating evidence for general use.

2.2.2 Control Allocation should be maintained or increased

In a two-arm trial, increases in allocation to the active arm require reduced control allocation. In contrast, when there are multiple active arms, we can increase the allocation to certain active arms while maintaining, or even increasing, the control allocation. This is desirable, as any reduction in control allocation can offset any gains from increased allocation on another arm.

To see this numerically, suppose we have a 500-patient study with control and 4 active arms. A fixed trial might enroll 1:1:1:1:1 (100 per arm). Assuming equal variances for each arm, estimated treatment effects (at least for continuous variables) have a variance of the sigma^2 ( (1/100)+(1/100) ) = 0.0200 sigma^2. Suppose we did RAR, increasing the allocation to the best arm but also decreasing allocation to the control, for example to 300 on the best arm and 50 on control. There are almost twice as many patients in the comparison (350 patients for RAR compared to 200 for the fixed trial), but the inferences are still worse. The variance of the estimated treatment effect is now sigma^2 ( (1/300) + (1/50) ) = 0.0233 sigma^2. The variance has grown, not decreased.

For this reason, more recent RAR trials maintain or increase the control allocation, even if the control is doing worse than the active arms. If the desire is to minimize patient exposure to a poorly performing control arm, this is handled through early stopping of the entire trial for success, after the question is answered, rather than a gradual decrease in control allocation that may jeopardize the trial drawing firm conclusions.

Control arm allocation in multiple arm RAR trials is explored in detail in:

Viele K, Broglio K, McGlothlin A, Saville BR. Comparison of methods for control allocation in multiple arm studies using response adaptive randomization. Clin Trials. 2020 Feb;17(1):52-60. doi: 10.1177/1740774519877836. Epub 2019 Oct 19. PMID: 31630567.

Some of the pitfalls of not maintaining control arm allocation are explored in:

Wathen JK, Thall PF. A simulation study of outcome adaptive randomization in multi-arm clinical trials. Clin Trials. 2017 Oct;14(5):432-440. doi: 10.1177/1740774517692302. Epub 2017 Feb 1. PMID: 28982263; PMCID: PMC5634533.

2.2.3 Are you interested in the best arm, or comparing all arms to control?

RAR is often a good match for trials that strive to identify the best of several active arms and then compare that arm to control. This contrasts to trials that are attempting to draw a firm conclusion comparing every active in the study to control. With RAR any increase in allocation to the best arm comes with a decrease in allocation to the non-best arms. When only the best arm is of interest, this tradeoff is desirable. When all arms are of interest, this tradeoff may be quite detrimental. When all arms are of interest, in might be better to maintain constant allocation but drop arms as their efficacy (or lack thereof) is established. This might lead to a MAMS design or some other arm dropping paradigm. It may also be handled within an RAR framework by changing allocation based on the posterior probability each arm beats placebo rather than the probability each arm is the best. This results in any arm retaining promise continuing the receive some allocation, even if other arms are performing better. One example in the RAR literature is:

Trippa L, Lee EQ, Wen PY, Batchelor TT, Cloughesy T, Parmigiani G, Alexander BM. Bayesian adaptive randomized trial design for patients with recurrent glioblastoma. J Clin Oncol. 2012 Sep 10;30(26):3258-63. doi: 10.1200/JCO.2011.39.8420. Epub 2012 May 29. PMID: 22649140; PMCID: PMC3434985.

2.2.4 Platform trials may have their own unique behavior

In a platform trial, arms enroll until we have answered our questions about them or they reach a maximal sample size. If the platform lacks a maximum sample size (individual arms may be capped, but the overall platform is not), then increasing or decreasing allocation to individual arms may not change their total sample size but instead change the speed at which arms enroll. Arms with high allocation read out quickly, lower allocation arms read out slower. I don’t think this issue has been adequately explored and may require additional metrics to properly explore it. For example, one platform metric is the time required to identify a good therapy from the collection of possible therapies, this is usually not considered outside of a platform.

2.2.5 Time Trends

An additional issue that arises with RAR, as opposed to fixed allocation trials, is the potential for biases due to time trends. Fixed allocation randomized trials are almost magical in their ability to produce comparable groups. This can break with RAR and may require modeling adjustments.

Suppose I have a control and two active arms, and I run a 600-patient trial. I enroll 1:1:1 throughout the trial. If I think of the “early” period as the first 300 patients in the trial, and the “late” period as the second 300 patients in the trial, then each arm will have 200 patients total, and there will be 100 early and 100 late patients in each arm. Even if there are differences between the early and late periods, those differences are equally placed in all arms. Our arms remain comparable and thus we avoid time-based biases.

In contrast, suppose I perform RAR. I enroll 1:1:1 early in the trial (first 300 patients), update my allocation probabilities at an interim, and then randomize 2:1:3 (e.g. 33% control, 17% arm A, 50% arm B) later in the trial (last 300 patients). We then find arm B looks the best and want to compare it to control.

We will find that control arm has 100 early patients and 100 late patients, so it is split 50-50 between time periods. Arm B, in contrast, has 100 early patients and 150 late patients, so it is split 40-60. If there are systematic differences between early and late patients, this can introduce biases in the estimate of the treatment effect.

This is discussed in:

Korn EL, Freidlin B. Time trends with response-adaptive randomization: The inevitability of inefficiency. Clin Trials. 2022 Apr;19(2):158-161. doi: 10.1177/17407745211065762. Epub 2022 Jan 6. PMID: 34991348.

in addition to:

Korn EL, Freidlin B. Outcome--adaptive randomization: is it useful? J Clin Oncol. 2011 Feb 20;29(6):771-6. doi: 10.1200/JCO.2010.31.1423. Epub 2010 Dec 20. PMID: 21172882; PMCID: PMC3056658.

These papers also note that, specific to two arm trials, time trends are difficult to account for because one of the arms is often receiving low allocation. Hence it is difficult to estimate time effects properly with the single high allocation arm remaining while still estimating the overall treatment effect.

The standard solution to this issue in multiple arm trials is to include time as a covariate in the model, where time is a factor variable denoting the time bin the patient was randomized in. Patients randomized prior to the first interim are in time bin 1, those randomized between the first and second interim are in time bin 2, etc. Note that time bins should be defined whenever allocation is changed but could also include additional time breaks. The model is then augmented to

Outcome = Intercept + Treatment Arm + Time Bin + Error

(adjusted to whatever link function you are using).

When the effect of time is additive, this adjustment avoids biases due to time. In other words, the overall parameters in each arm can change over time, but the treatment effects, on whatever scale you are modeling, must remain constant. This is a common assumption, as evidenced by how many protocols refer to “the treatment effect” as opposed to “the treatment effect as a function of time”, but is not guaranteed to hold. If the assumption doesn’t hold, problems occur for both RAR and fixed trials. For example, in a fixed trial we maintain comparability among the arms (no systematic biases), but we are not estimating “the treatment effect”. There simply is not a single “the treatment effect”, instead we are estimating some time weighted integral of the treatment effect function, weighted by the observed accrual rate. It is unclear how to interpret such a construction outside of the clinical trial.

These models are discussed in detail in:

Saville BR, Berry DA, Berry NS, Viele K, Berry SM. The Bayesian Time Machine: Accounting for temporal drift in multi-arm platform trials. Clin Trials. 2022 Oct;19(5):490-501. doi: 10.1177/17407745221112013. Epub 2022 Aug 22. PMID: 35993547.

This paper has relevant information on time trends in RAR, although it is focused on nonconcurrent controls in platform trials (which may be thought of as the same issue, just with an extreme randomization ratio that has zero active patients at certain times). More focused discussion of time trends for the non-platform setting may be found in:

Berry LR, Lorenzi E, Berry NS, Crawford AM, Jacko P, Viele K. Effects of Allocation Method and Time Trends on Identification of the Best Arm in Multi-Arm Trials. Statistics in Biopharmaceutical Research 2024;16(4);512-525. doi:10.1080/19466315.2023.2298961.

An additional paper, also in the platform form trial space:

Bonnett T, Potter GE, Dodd LE. Examining the bias-efficiency tradeoff from incorporation of nonconcurrent controls in platform trials: A simulation study example from the adaptive COVID-19 treatment trial. Clin Trials. 2025 Aug;22(4):471-481. doi: 10.1177/17407745251313928. Epub 2025 Feb 8. PMID: 39921419.

They draw similar conclusions about the required assumptions for time trends but also attempt to connect these assumptions to a real dataset (the ACTT 1—3 data). While we do need to see whether this assumption holds in practical trials, some key features of the ACTT trials complicate the interpretation in this paper. The model above, to be estimated, requires multiple arms to span multiple time bins, which are not present in the ACTT trials. Thus, the authors explore potential hypothetical situations with overlap, rather than discern which hypothetical situation occurred in the ACTT trials. In these hypotheticals, the models perform as described above, working well when time is an additive component and poorly when the model is mis specified for the hypothetical.

2.2.6 Difficulties with delayed outcomes

As with any adaptive trial, adaptations are only as useful as the available information. If outcomes are delayed relative to the accrual rate, we may find ourselves with a limited number of complete patients. This will result in high uncertainty at the interim and typically minimal value to RAR. This may be partially mitigated if there are earlier outcomes that are predictive of a final outcome (for example a 6-month knee outcome may be highly predictive of a 2-year knee outcome), where that early information can still be used to compute updated RAR probabilities. If such information is not available, it may simply be that RAR, or any adaptation for that matter, may not be useful.

3.0 The nuts and bolts

Everything above reflects general principles and tradeoffs in selecting and using RAR. In practice, RAR also involves multiple parameters that directly translate the current posterior distribution into allocation probabilities. The most common choices for this “allocation function” involve the probability each active arm is the best active arm (Pr(Max)_a) or the probability each arm beats the control (Pr(PBO)_a). A particularly common choice is simply to assign the control some fixed proportion p_ctrl of the patients and then divide the remaining (1-p_ctrl) of the patients between the active arms in proportion to the values of Pr(Max). Additional bells and whistles can be added such as replacing Pr(Max) with Pr(PBO) in the calculation, raising Pr(Max) to a power before performing the normalization, or thresholding allocation probabilities to zero if they are small (and then renormalizing the remaining active allocation probabilities). In addition, we must select interim timing.

Note that this exploration is often confounded with the principles above. For example, some early papers noted advantages of taking a square or other root of Pr(Max) prior to normalizing probabilities. However, many of these papers also employed features such as decreasing the control arm allocation over time. Since decreasing the control arm allocation can have a detrimental effect on performance, these papers naturally found an advantage to taking a transformation like square root, which diminishes the RAR. In other situations, where control allocation is maintained, there is often no reason to “diminish” the RAR, and higher power provides good performance. In practice, while we have explored multiple exponents in the RAR, we tend to find that simply using a power of 1 is a reasonable choice.

3.1 Some literature exploring multiple parameter settings in RAR

Two companion papers which explore several of these design parameters are:

Viele K, Broglio K, McGlothlin A, Saville BR. Comparison of methods for control allocation in multiple arm studies using response adaptive randomization. Clin Trials. 2020 Feb;17(1):52-60. doi: 10.1177/1740774519877836. Epub 2019 Oct 19. PMID: 31630567.

This paper, referenced above, further emphasizes the need for maintaining or increasing control allocation through multiple trial metrics and compared multiple methods for achieving this goal. A second related paper explores a variety of other allocation parameters, including interim timing, thresholding of allocation proportions to 0, and the differences between Pr(Max) and Pr(PBO):

Viele K, Saville BR, McGlothlin A, Broglio K. Comparison of response adaptive randomization features in multiarm clinical trials with control. Pharm Stat. 2020 Sep;19(5):602-612. doi: 10.1002/pst.2015. Epub 2020 Mar 21. PMID: 32198968.

For the situations explored, interim analyses every 20% of the trial data worked well, with Pr(Max) being strongly preferred as well as aggressive thresholding toward 0 for arms with otherwise low allocation probability.

Note that while these papers explored “typical” scenarios, care should be taken to avoid overgeneralizing these results. Each individual trial should be simulated to tune these parameters. Finally, some of these settings are also explored in the paper referenced above for time trends:

Berry LR, Lorenzi E, Berry NS, Crawford AM, Jacko P, Viele K. Effects of Allocation Method and Time Trends on Identification of the Best Arm in Multi-Arm Trials. Statistics in Biopharmaceutical Research 2024;16(4);512-525. doi:10.1080/19466315.2023.2298961.

3.2 Cautionary tales

A nice illustration of how RAR parameter choices can lead to difficulties is found in:

Thall P, Fox P, Wathen J. Statistical controversies in clinical research: scientific and ethical problems with adaptive randomization in comparative clinical trials. Ann Oncol. 2015 Aug;26(8):1621-8. doi: 10.1093/annonc/mdv238. Epub 2015 May 15. PMID: 25979922; PMCID: PMC4511222.

In their paper, RAR is implemented in a binomial setting, with the earliest interim occurring after very few patients have been observed, and with a noninformative Beta priors with low values for alpha and beta. In such a setting, we obtain “sticky” RAR probabilities. Suppose for example we are conducting an interim after only 2 patients per arm, and we find one arm has 0/2 responders and the other has 2/2 responders. Note even with true 50% rates in each arm, this has a 1/8th chance of occurring. With a Beta(0.5,0.5) prior on each arm, we obtain Beta(0.5,2.5) and Beta(2.5,0.5) posterior distributions. We also find a 98% posterior probability that the 2/2 arm is better than the 0/2 arm. This is an overreaction stemming from the fact that the Beta(0.5,0.5) tends to assume the rates are extreme (near 0 or near 1), thus can result in the RAR only allocating to one of the arms, limiting the chance of ever correcting the mistake. This creates many additional negative effects.

Such problems can be identified by properly simulating trials prior to initiation. They are often easily avoided by changing the RAR parameters. For example, here we would recommend a longer burn in period, thus minimizing the probability of 0 responder or 0 non-responder arms and the model overreaction. We might alternatively choose different prior if we were to insist on such early interims.

An additional paper provides a historical review of the ECMO trial from the 1980s and the potential difficulties with “play the winner” style RAR rules:

Proschan M, Evans S. Resist the Temptation of Response-Adaptive Randomization. Clin Infect Dis. 2020 Dec 31;71(11):3002-3004. doi: 10.1093/cid/ciaa334. PMID: 32222766; PMCID: PMC7947972.

I agree with many of the points about play the winner rules mentioned in this paper, although I will note that modern RAR trials have little in common with such rules. Thus, this paper might have been titled more effectively to indicate the specific form of RAR being addressed.

4.0 Operational Issues

Any efficiency gains from RAR will only be as good as the data which informs it, so like all adaptive trials it is vital that interims have good (not necessarily perfect) data, and that operational secrecy is maintained, meaning that the allocation probabilities are secret and the arms are blinded to avoid divulging the current allocation parameters.

In terms of blocking, often a “control/active” blocking structure is maintained, where we have blocks of K patients, with C patients allocated to control and the remaining K-C patients allocated at random to the active arms. Each active patient is allocated by a random, independent draw according to the current allocation probabilities.

The fully independent draws above create the possibility that active arms will randomly have higher or lower allocation, by chance, than the current RAR probabilities. We have explored various mechanisms for forcing the observed allocations to match the desired allocation probabilities. While these are quite effective at controlling the observed allocation frequencies, with modest or larger sample sizes we have seen little gain in operating characteristics from such methods (but generally no harm, either). In very small sample sizes, for example in rare diseases, these methods may be more useful. We have not published this internal research but can discuss it on request.

Additionally, sponsors may want to apply some sort of “minimization like’ algorithm to maintain covariate balance. As above, for large samples randomization alone may perform adequately, and such methods may only add extra complexity. However, when such a method is desired, one possibility is discussed in:

Saville BR, Berry SM. Balanced covariates with response adaptive randomization. Pharm Stat. 2017 May;16(3):210-217. doi: 10.1002/pst.1803. Epub 2017 Mar 6. PMID: 28261972.

5.0 Review paper on RAR, with practical examples

I highly recommend the thorough review paper referenced below:

Robertson DS, Lee KM, López-Kolkovska BC, Villar SS. Response-adaptive randomization in clinical trials: from myths to practical considerations. Stat Sci. 2023 May;38(2):185-208. doi: 10.1214/22-STS865. PMID: 37324576; PMCID: PMC7614644.

In addition to the review paper itself, there are several commentaries attached, one of which has a review of practical trials which have employed RAR:

Berry SM, Viele K. Comment: Response Adaptive Randomization in Practice." Statist. Sci. 38 (2) 229 - 232, May 2023. https://doi.org/10.1214/23-STS865F

6.0 Current thoughts on RAR

Final note…there may be simpler things than RAR.

Personally, while I remain enthusiastic that RAR has strong benefits over fixed trials (within the “find the best among multiple active arms” framework), the results in the following have made me less enthusiastic about choosing RAR in practice, as opposed to a particular version of simpler arm dropping designs:

Berry LR, Lorenzi E, Berry NS, Crawford AM, Jacko P, Viele K. Effects of Allocation Method and Time Trends on Identification of the Best Arm in Multi-Arm Trials. Statistics in Biopharmaceutical Research 2024;16(4);512-525. doi:10.1080/19466315.2023.2298961.

These simpler designs, at least in the examples considered in the paper (don’t overgeneralize!), appear to generate equivalent performance. Additionally, since the allocation frequencies are held constant over time in the arm dropping designs, they do not suffer from any concerns regarding time trends. Operationally, arm dropping may be easier to implement, certainly in an environment where many vendors don’t have extensive experience with RAR.

The important property of the arm dropping designs in Berry et al. that performed best  involved dropping arms based on the probability each active arm was the best arm, as opposed to more standard methods of arm dropping based on a p-value or the posterior probability each arm beats the control. This difference has a substantial impact on power, with arm dropping based on a p-value performing worse in the “find the best arm” context. As with all adaptive methods, the methods must be tailored to the question. If you are looking for the best arm, adapt to find the best arm. If you are looking to evaluate whether each arm is better than placebo, then adapt on that quantity.

As always, don’t overgeneralize. The paper above doesn’t consider platform trial settings, for example, nor does it necessarily apply to large number of arms or other differences from the specific scenarios it considers. However, it does indicate that care should be taken in selecting a design, and that it is possible simpler adaptive designs may provide similar benefit to RAR. These alternatives should be explored when considering an RAR design.

7.0 Closing

Extensive research continues in response adaptive randomization, and we assume this blog will become dated over time and need to be updated. We look forward to seeing that continued work in this interesting area.

Download PDF
View