Blog: Time Trends in Clinical Trials (related to 2023 ASA Biopharm panel)

Blog by: Kert Viele

On Thursday, I will be moderating a panel session (PS3B, 9/28/2023, 4:15-5:30 p.m.) at the ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop (#biop2023). The panelists will be Franz Koenig, Nick Berry, Liz Lorenzi, and Dan Rubin, talking about time trends in ongoing platform trials.

We’ll be showing real interim analyses from the PRINCIPLE and REMAP-CAP platform trials, showing how the trials evolved over time and some of the modeling used in each. The panel will cover:

1) How may different time trends impact analyses of clinical trials, particularly platform trials?

2) What models might properly adjust for time trends, and what are their assumptions?

3) How might we check these assumptions?

4) What can we say specifically about PRINCIPLE and REMAP-CAP?

Don’t expect any final answers, this is a hard problem, and PRINCIPLE and REMAP-CAP are only two trials! But seeing real trial data, especially on a valuable scientific question, is always interesting.

Time trends have become an important “worry point” in adaptive designs that allow different allocation ratios at different points in the trial. It’s worthwhile to distinguish between “additive” time trends and “interactive” time trends. Here, “additive time trends” refers to any time effects that occur equally across arms, for example, a change in disease severity that lowers the mean outcome equally for all arms in the trial, while “interactive time trends” refers to time effects that may change the treatment effects over time, for example, a therapy which has a certain advantage over control at some time point but a different advantage over control at other time points. This might reflect a changing trial population over time, where some therapies are highly effective only within an unknown subset of the entire population, with that subpopulation making up more or less of the trial population at different times. Many times, trends, additive or interactive, may occur for completely unknown reasons.

Why do we worry? Let’s start with additive time trends. Here, we usually imagine a model like:

Outcome = Intercept + (other model terms) + Arm Effects + f(Time) + Error

Where f(time) is some unknown function over time that adds to all outcomes, regardless of arm, for “standard” trials with constant allocation ratios over time (e.g., basic 1:1 or 2:1 trials, etc.), the distribution of times within each arm is the same. We expect the same proportion of control patients to be early (or late) in the trial as we do for the treatment patients. Thus, when we estimate treatment differences, for example, by taking a difference in sample sizes, the f(Time) pieces are equal in expectation across arms and drop out. The difference in means remains an unbiased estimator of the treatment effect. Note that if time is highly predictive, it may still be valuable to model time for its variance reduction, but no time modeling is needed to remove bias.

For trials which alter the allocation ratio over time, we can find that some arms in the trial have more (or fewer) patients early in the trial and fewer (or more) patients late in the trial compared to other arms. In these situations, a straight difference in means creates a biased estimate. The estimate for the arm with a lot of early patients has a large contribution from early time effects, and the estimate for the arm with a lot of late patients has a large contribution from late time effects. The naïve treatment estimate is biased by the difference in time effects between the early and late time periods.

This applies to adaptive designs such as response adaptive randomization (which changes allocation ratios at each interim analysis) or in platform trials that utilize non-concurrent controls, where the treatment effect estimate is based on the time when both treatment and control were randomizing together, as well as control patients from times prior to the treatment arm entering the trial. Again, the control arm estimate contains time effects from both the “concurrent” (with treatment) and “non-concurrent” (prior to treatment) time periods, while the treatment arm only contains patients from the “concurrent” time period. As such, biases can ensue.

Fortunately, this additive time trend bias is correctable. Naïve estimates are not recommended in this setting, and modeling time can remove the biases created by additive time effects. A standard model is to divide time into “buckets,” defined by the times where allocation ratios are altered. This could occur either by directly changing allocation ratios through RAR or through the addition or removal of an arm in a platform trial. When analyzing data, we place an additive effect for each time bucket into the model (sometimes, these are smoothed over time, especially if there are many interim analyses). For additive time trends, this is sufficient to remove bias. For recent work in this area, see the following twitter/X threads discussing several recent papers.

Thus, for additive time trends, we have an effective solution (kinda….I haven’t even gotten into the issue that additivity on one scale may be non-additive on another scale, for example, relative risks versus log odds). Importantly, the inclusion of a time covariate typically has minimal cost. Even if the additive time effects are 0, the cost of including a time effect is minimal, and thus, including time effects in the model for robustness is recommended.

This solution does not apply to interactive time trends. The reason why isn’t particularly deep. If we are fitting a model with additive time effects and no interactions, and the truth is that time and treatment effect interact, we expect our estimates may be biased and that inferential tests may have poor performance. In fact, it’s unclear to me how to conduct ANY clinical trial in this setting. A standard fixed trial has the wonderful property that it guarantees equal representation of all time buckets in all arms so that our analysis populations are comparable (my colleague Farah Khandwala is presenting a poster generalizing this result in platform trials at ASA Biopharm on Thursday evening, check it out!). But comparability between control and treatment arms doesn’t imply you are estimating something meaningful or generalizable in this setting. If we take a simple difference in overall means in the presence of a time/treatment interaction, our resulting mean difference is an estimate of the weighted average treatment effect over time, with weights corresponds to our observed enrollment into the trial. Is that what we want? If we wanted to estimate the treatment effect at each time point, we likely require enormous sample sizes (since we need sufficient enrollment in each unit of time). Additionally, if treatment effects vary over time, how can a trial conducted in 2021-2023 imply a treatment should be given in 2025 onward? Clearly, this might be warranted with some assumptions, but they haven’t been clearly articulated.

Given the difficulty with time/treatment interaction, it is desirable to know when additive time trends may be present since we have a more efficient solution. This brings us to the panel, which will show the results of PRINCIPLE and REMAP-CAP related to this question. What information is there in the data that might confirm or cast doubt on the additivity assumption? Fair warning, as with many assumption checks, there will be a discussion on how powerful any tests of this assumption might be and the relative costs/benefits of overreacting to noise. We won’t be able to answer the question, but hope the discussion will be interesting in how to make progress on this issue.

For more ASA Biopharm sessions featuring Berry Consultants, click here.

More Recent News

April 2024 FACTS Webinar

 The Berry Consultants Software team will be hosting its April webinar next Friday, April 26 at 11am EDT on Optimizing Simulated Trial Designs.

March 2024 FACTS Webinar

The Berry Consultants FACTS Software team will be hosting its March webinar next Friday, March 22 at 11am EDT on Response Adaptive Randomization (RAR).