Paper 1

Title

Accounting for patient heterogeneity in phase II clinical trials

Download pdf

Authors

J. Kyle Wathen, Peter F. Thall, John D. Cook, Elihu H. Estey

Summary

  • Phase II clinical trials typically are single-arm studies conducted to decide whether an experimental treatment is sufficiently promising, relative to standard treatment, to warrant further investigation.

  • a class of model-based Bayesian designs for single-arm phase II trials with a binary or time-to-event(TTE) outcome and two or more prognostic subgroups.

  • consider treatment-subgroup interactions.

  • early stopping rules are subgroup specific and allow the possibility of terminating some subgroups while continuing others.

  • informative priors on standard treatment parameters and subgroup main effects + non-informative priors for experimental treatment parameters and treatment-subgroup interactions.

  • algorithm for computing prior hyperparameter values.

  • simulation

Introduction

  • two subgroups: good(G) and poor(P)

  • two treatment: standarad(S) and experimental(E)

Probability Model

  • K subgroups.
  • $t \in {S, E}, Z \in {0, 1, …, K-1 }
  • Linear components: $$\eta{t, Z}(\boldsymbol{\theta}) = \xi + \sum{k=0}^{K-1}\left{\beta{k} + \tau{k} I(t=E)\right} I(Z=k)$$
  • reference group: $\beta_0 = 0$
  • $\theta = (\xi, \beta, \tau)$
  • historical effect: $\boldsymbol{\beta} = {\beta0, …, \beta{K-1}}$
  • E-versus-S treatment effect within subgroup k: $\boldsymbol{\tau} = {\tau0, …, \tau{K-1}}$
  • The purpose of the trial is to learn about $\tau$
  • The parametrization used here is critical to how well the method performs. It was chosen because it allows borrowing strength across subgroups and reflects the fact that much more is known a priori about the standard treatment.
  • binary case: $\pi{t, Z}(\theta) = P(Y = 1|t, Z, \theta) = logit^{-1}(\eta{t, Z}(\theta))$
  • Likelihood: $\mathscr{L}\left(\mathscr{D}{n} | \boldsymbol{\theta}\right)=\prod{i=1}^{n}\left{\pi{E, Z{i}}(\boldsymbol{\theta})\right}^{Y{i}}\left{1-\pi{E, Z{i}}(\boldsymbol{\theta})\right}^{1-Y{i}}$
  • time to event or right censoring: $Y^0$
  • censoring indicator: $\epsilon = I(Y = Y^0)$
  • Likelihood: $\mathscr{L}\left(\mathscr{D}{n} | \boldsymbol{\theta}\right)=\prod{i=1}^{n} f{E, Z{i}}\left(Y{i}^{0} | \boldsymbol{\theta}\right)^{\varepsilon{i}} \mathscr{F}{E, Z{i}}\left(Y{i}^{0} | \boldsymbol{\theta}\right)^{1-\varepsilon{i}}$
  • Reasonable to assume exponentially distributed event times with $E[Y | t, Z, \theta] = exp(\eta_{t, Z}(\theta))$

Establishing priors.

  • $\xi \sim N(\mu\xi, \sigma\xi^2)$, $\sigma_\xi^2$ needs to be small.

  • $\betak \sim N(\mu{\betak}, \sigma{\betak}^2)$, $\sigma{\beta_k}^2$ needs to be small.

  • $\tauj \sim N(\mu{\tauj}, \sigma^2{\tauj})$, $\sigma{\tau_j}^2$ needs to be large.

  • Denote $\Psi = (\Psi\mu, \Psi{\sigma^2})$ as collection of parameters.

  • In some Bayesian models, there is a direct link between the prior hyperparameter values and a putative number of patients on whom the prior is based, i.e. its effective sample size (ESS). e.g. ESS for Beta(a, b) is a+b.

  • To do this, we first impose K constraints $E[\pi{E, j}(\theta)] = E[\pi{S, j}(\theta)]$, so that the prior expected response rate of E equals that of S within each prognostic subgroup. These constraints are only valid for superiority trials and may be too optimistic for non-inferiority trials ?? So, impose the constraints as $E[\pi{E, j}(\theta)] = E[\pi{S, j}(\theta)] - \epsilon_j$, where $\epsilon_j$ = 0.05 to 0.10 (related to 0.15 improvement desired?)

  • Let $f{t, j}(p), 0 \le p \le 1$ denote the prior density on $\pi{t, j}(\theta)$ induced by the prior on $\theta$.

  • Given the above constraints on the mean probability, we determine $\Psi$ by matching moments of $f{t, j}$ to $Beta(a{t, j}, b{t, j})$ with pdf denoted by $f^*{t, j}$. Note: $f_{t, j}$ is not Beta distribution

  • Assume the estimates of the historical rate and patients $\hat{\pi}{S, j}$ and $N{S, j}$ are available. Let $N{E, j}$ denote the desired prior ESS for $\pi{E, j}(\theta)$. Given these value, use the following algorithm to determine $\Psi$.

Algorithm for determining prior hyperparameter

  • Step 1: set $a{S, j} = N{S, j}\hat{\pi{S, j}}$, and $b{S, j} = N{S, j}(1 - \hat{\pi}{S, j})$.
  • Step 2: $(\mu\xi, \sigma^2\xi) = arg\min{\mu\xi, \sigma^2_\xi} \int0^1 |f{S, 0}(p) - f{S, 0}^*(p)| dp, s.t. E[\pi{S, 0}(\theta)] = \hat{\pi}_{S, 0}$.
  • Step 3: $(\mu_{\betaj}, \sigma^2{\betaj}) = arg\min{(\mu_{\betaj}, \sigma^2{\beta_j})} \int0^1 |f{S, j}(p) - f{S, j}^*(p)| dp, s.t. E[\pi{S,j}(\theta)] = \hat{\pi}_{S, j}$.
  • Step 4: set $a{E, j} = N{E, j}\hat{\pi{S, j}}, b{E, j} = N{E, j}(1 - \hat{\pi{S, j}})$
  • Step 5: $(\mu_{\tauj}, \sigma^2{\tauj}) = arg\min{(\mu_{\tauj}, \sigma^2{\tau_j})} \int0^1 |f{E, j}(p) - f{E, j}^*(p)| dp, s.t. E[\pi{E,j}(\theta)] = \hat{\pi}_{S, j}$.
  • To reflect prior uncertainty about E in step 5, reasonable values for $N{E, j}$ are 12, 1, or 2, which yield suitably large values for each $\sigma^2{\tau_j}$.
  • one may assume either more skeptical or more optimistic prior means for $\pi_{E, j}(\theta)$, in step 5.

posteriors

omitted

Decision Criteria

subgroup-specific early stopping rules. $$ \lambda\left(\mathscr{D}{n}, \boldsymbol{\theta}, j, \delta{j}\right)=\operatorname{Pr}\left{\pi{E, j}(\boldsymbol{\theta})>\pi{S, j}(\boldsymbol{\theta})+\delta{j} | \mathscr{D}{n}\right}<p_{j}$$

  • $p_j$ is fixed lower probability cut-off, usually in the range of 0.01-0.10
  • may be calibrated to obtain pre-specified FNR.

Simulation

compare the following methods

  • S-TI (new method)
  • S-NTI
  • NS
  • SEP

Discussion

Flexibility

  • S-TI has much more desirable properties in the presence of subgroup-treatment interactions.
  • otherwise, simplier methods S-NTI or SEP are more desirable because of less complicity
  • inbalanced subgroups.
  • focus on binary data, but similar procedures can be used in the context of TTE data using likelihood in Model section.

use mean TTE $\hat{\mu_{S, j}}$.

In algorithm use Inverse Gamma (a,b) instead of Beta(a, b)

Reference

Thoughts

To generalize binary outcome to continuous outcome

  • generalize binary outcome to ordinal outcome.

  • how to convert continuous outcome to multilevel outcome

  • cutoff values to make continuous variable to ordinal outcome

  • clustering based on 1-dim value, maximum between group distance.

  • (something similar to PCA), contrast

  • supposed to be a monotone-transformation-free distance.

  • choice of number of ordinal responses K.

gamma outcome. / log-normal outcome.

do exactly the same thing as Wathen’s paper.

  • historical data ->ESS -> parameter (a, b) or ($\mu, \sigma^2$)

Question:

why model continuous response? even if continuous response, we still need to pre-specify cutoff value?

model checking/diagnosis for Normal outcome?

GoF?

Previous