Title
Accounting for patient heterogeneity in phase II clinical trials
Authors
J. Kyle Wathen, Peter F. Thall, John D. Cook, Elihu H. Estey
Summary
Phase II clinical trials typically are single-arm studies conducted to decide whether an experimental treatment is sufficiently promising, relative to standard treatment, to warrant further investigation.
a class of model-based Bayesian designs for single-arm phase II trials with a binary or time-to-event(TTE) outcome and two or more prognostic subgroups.
consider treatment-subgroup interactions.
early stopping rules are subgroup specific and allow the possibility of terminating some subgroups while continuing others.
informative priors on standard treatment parameters and subgroup main effects + non-informative priors for experimental treatment parameters and treatment-subgroup interactions.
algorithm for computing prior hyperparameter values.
simulation
Introduction
two subgroups: good(G) and poor(P)
two treatment: standarad(S) and experimental(E)
Probability Model
- K subgroups.
- $t \in {S, E}, Z \in {0, 1, …, K-1 }
- Linear components: $$\eta{t, Z}(\boldsymbol{\theta}) = \xi + \sum{k=0}^{K-1}\left{\beta{k} + \tau{k} I(t=E)\right} I(Z=k)$$
- reference group: $\beta_0 = 0$
- $\theta = (\xi, \beta, \tau)$
- historical effect: $\boldsymbol{\beta} = {\beta0, …, \beta{K-1}}$
- E-versus-S treatment effect within subgroup k: $\boldsymbol{\tau} = {\tau0, …, \tau{K-1}}$
- The purpose of the trial is to learn about $\tau$
- The parametrization used here is critical to how well the method performs. It was chosen because it allows borrowing strength across subgroups and reflects the fact that much more is known a priori about the standard treatment.
- binary case: $\pi{t, Z}(\theta) = P(Y = 1|t, Z, \theta) = logit^{-1}(\eta{t, Z}(\theta))$
- Likelihood: $\mathscr{L}\left(\mathscr{D}{n} | \boldsymbol{\theta}\right)=\prod{i=1}^{n}\left{\pi{E, Z{i}}(\boldsymbol{\theta})\right}^{Y{i}}\left{1-\pi{E, Z{i}}(\boldsymbol{\theta})\right}^{1-Y{i}}$
- time to event or right censoring: $Y^0$
- censoring indicator: $\epsilon = I(Y = Y^0)$
- Likelihood: $\mathscr{L}\left(\mathscr{D}{n} | \boldsymbol{\theta}\right)=\prod{i=1}^{n} f{E, Z{i}}\left(Y{i}^{0} | \boldsymbol{\theta}\right)^{\varepsilon{i}} \mathscr{F}{E, Z{i}}\left(Y{i}^{0} | \boldsymbol{\theta}\right)^{1-\varepsilon{i}}$
- Reasonable to assume exponentially distributed event times with $E[Y | t, Z, \theta] = exp(\eta_{t, Z}(\theta))$
Establishing priors.
$\xi \sim N(\mu\xi, \sigma\xi^2)$, $\sigma_\xi^2$ needs to be small.
$\betak \sim N(\mu{\betak}, \sigma{\betak}^2)$, $\sigma{\beta_k}^2$ needs to be small.
$\tauj \sim N(\mu{\tauj}, \sigma^2{\tauj})$, $\sigma{\tau_j}^2$ needs to be large.
Denote $\Psi = (\Psi\mu, \Psi{\sigma^2})$ as collection of parameters.
In some Bayesian models, there is a direct link between the prior hyperparameter values and a putative number of patients on whom the prior is based, i.e. its effective sample size (ESS). e.g. ESS for Beta(a, b) is a+b.
To do this, we first impose K constraints $E[\pi{E, j}(\theta)] = E[\pi{S, j}(\theta)]$, so that the prior expected response rate of E equals that of S within each prognostic subgroup. These constraints are only valid for superiority trials and may be too optimistic for non-inferiority trials ?? So, impose the constraints as $E[\pi{E, j}(\theta)] = E[\pi{S, j}(\theta)] - \epsilon_j$, where $\epsilon_j$ = 0.05 to 0.10 (related to 0.15 improvement desired?)
Let $f{t, j}(p), 0 \le p \le 1$ denote the prior density on $\pi{t, j}(\theta)$ induced by the prior on $\theta$.
Given the above constraints on the mean probability, we determine $\Psi$ by matching moments of $f{t, j}$ to $Beta(a{t, j}, b{t, j})$ with pdf denoted by $f^*{t, j}$. Note: $f_{t, j}$ is not Beta distribution
Assume the estimates of the historical rate and patients $\hat{\pi}{S, j}$ and $N{S, j}$ are available. Let $N{E, j}$ denote the desired prior ESS for $\pi{E, j}(\theta)$. Given these value, use the following algorithm to determine $\Psi$.
Algorithm for determining prior hyperparameter
- Step 1: set $a{S, j} = N{S, j}\hat{\pi{S, j}}$, and $b{S, j} = N{S, j}(1 - \hat{\pi}{S, j})$.
- Step 2: $(\mu\xi, \sigma^2\xi) = arg\min{\mu\xi, \sigma^2_\xi} \int0^1 |f{S, 0}(p) - f{S, 0}^*(p)| dp, s.t. E[\pi{S, 0}(\theta)] = \hat{\pi}_{S, 0}$.
- Step 3: $(\mu_{\betaj}, \sigma^2{\betaj}) = arg\min{(\mu_{\betaj}, \sigma^2{\beta_j})} \int0^1 |f{S, j}(p) - f{S, j}^*(p)| dp, s.t. E[\pi{S,j}(\theta)] = \hat{\pi}_{S, j}$.
- Step 4: set $a{E, j} = N{E, j}\hat{\pi{S, j}}, b{E, j} = N{E, j}(1 - \hat{\pi{S, j}})$
- Step 5: $(\mu_{\tauj}, \sigma^2{\tauj}) = arg\min{(\mu_{\tauj}, \sigma^2{\tau_j})} \int0^1 |f{E, j}(p) - f{E, j}^*(p)| dp, s.t. E[\pi{E,j}(\theta)] = \hat{\pi}_{S, j}$.
- To reflect prior uncertainty about E in step 5, reasonable values for $N{E, j}$ are 1⁄2, 1, or 2, which yield suitably large values for each $\sigma^2{\tau_j}$.
- one may assume either more skeptical or more optimistic prior means for $\pi_{E, j}(\theta)$, in step 5.
posteriors
omitted
Decision Criteria
subgroup-specific early stopping rules. $$ \lambda\left(\mathscr{D}{n}, \boldsymbol{\theta}, j, \delta{j}\right)=\operatorname{Pr}\left{\pi{E, j}(\boldsymbol{\theta})>\pi{S, j}(\boldsymbol{\theta})+\delta{j} | \mathscr{D}{n}\right}<p_{j}$$
- $p_j$ is fixed lower probability cut-off, usually in the range of 0.01-0.10
- may be calibrated to obtain pre-specified FNR.
Simulation
compare the following methods
- S-TI (new method)
- S-NTI
- NS
- SEP
Discussion
Flexibility
- S-TI has much more desirable properties in the presence of subgroup-treatment interactions.
- otherwise, simplier methods S-NTI or SEP are more desirable because of less complicity
- inbalanced subgroups.
- focus on binary data, but similar procedures can be used in the
context of TTE data using likelihood in Model section.
use mean TTE $\hat{\mu_{S, j}}$.
In algorithm use Inverse Gamma (a,b) instead of Beta(a, b)
Reference
Accounting for patient heterogeneity in phase II clinical trials. Wathen JK, et al. Stat Med. 2008.
Thoughts
To generalize binary outcome to continuous outcome
generalize binary outcome to ordinal outcome.
how to convert continuous outcome to multilevel outcome
cutoff values to make continuous variable to ordinal outcome
clustering based on 1-dim value, maximum between group distance.
(something similar to PCA), contrast
supposed to be a monotone-transformation-free distance.
choice of number of ordinal responses K.
gamma outcome. / log-normal outcome.
do exactly the same thing as Wathen’s paper.
- historical data ->ESS -> parameter (a, b) or ($\mu, \sigma^2$)
Question:
why model continuous response? even if continuous response, we still need to pre-specify cutoff value?
model checking/diagnosis for Normal outcome?
GoF?