Published On: 10/19/2021
by Alex Breskin
Blog  One Model to Rule Them All: Using a Single Model to Control for Confounding and Informative Censoring
Introduction
Studies designs for estimating causal effects are numerous. Based on the design, it is often necessary to control or address several sources of bias, such as baseline and timevarying confounding, informative censoring, selection bias, and a whole host of others. Designs like the treatment decision design [1], new user design [2], and prevalent new user design [3] each address these biases in different ways and require seemingly different analytic approaches to yield unbiased estimates from their resulting data.
Recently, the ‘clonecensorweight’ approach [4–6] has become a popular way to estimate the effects of sustained or dynamic treatment regimens. However, this approach, and the way of thinking it entails (which involves conceptualizing a ‘target trial’ and adapting it to the observational setting [7]), is more general, and nearly all studies can be thought of in this way. Here, we show that a standard study of a point treatment can be thought of as a clonecensorweight design, and we show how confounding and informative censoring can be addressed with a single nuisance model.
The Setup
Consider a study of a binary baseline treatment, \(A\), on a timetoevent, \(T\). Patients may be censored prior to experiencing the event, and the time of censoring is \(C\). A patient’s observed followup time is \(\tilde{T}=min(T,C)\). In addition, a set of baseline covariates sufficient to control for confounding and informative censoring are collected, denoted \(W\). Finally, we define \(\Delta=C>\tilde{T}\), which is an indicator that a patient was not censored at their observed followup time (and therefore had the event). A subject’s observed data therefore consist of \(\{A, \tilde{T}, W, \Delta\}\).
One estimator for the counterfactual cumulative incidence of the outcome under treatment level \(A=a\) is [8]:
\[ \hat{Pr}(T(a)<t)=\frac{1}{n}\sum_{i=1}^n{\frac{\Delta_iI(\tilde{t}_i<t)I(A_i=a)}{\hat{Pr}(\Delta=1W_i,A_i,T_i)\hat{Pr}(A=aW_i)}}, \]
where \(T(a)\) is the time of the event had, possibly counter to fact, a subject received treatment level \(A=a\), \(n\) is the total population size, and each of the probabilities in the denominator are modeled appropriately, e.g., with a Cox proportional hazards model for the censoring model and logistic regression for the treatment model.
Data Generation
Here, we generate a simple dataset for demonstration.
expit < function(p){
exp(p)/(1+exp(p))
}
n < 10000
dat < tibble(
id = 1:n,
W = runif(n),
A = rbinom(n, 1, expit(W)),
T0 = rexp(n, rate = 0.5 + 2*W),
T1 = rexp(n, rate = 1 + 2*W),
T = A*T1 + (1A)*T0,
C = rexp(n, rate = .5 + .55*A + .5*W)
)
Note that our true causal risk difference is 11.93%.
Typical Study Design and Analysis
Using the causalRisk package, we can easily implement the estimator described above to get the unadjusted and adjusted cumulative incidence curves:
mod_unadj < specify_models(identify_treatment(A),
identify_outcome(T),
identify_censoring(C))
mod_adj < specify_models(identify_treatment(A, ~W),
identify_outcome(T),
identify_censoring(C, ~W))
fit_unadj < estimate_ipwrisk(dat, mod_unadj, times = seq(0, 0.5, by = 0.01), labels = "Unadjusted, Standard")
fit_adj < estimate_ipwrisk(dat, mod_adj, times = seq(0, 0.5, by = 0.01), labels = "Adjusted, Standard")
make_table1(fit_adj, side.by.side = T)
plot(fit_unadj, fit_adj)
make_table2(fit_unadj, fit_adj, risk_time = 0.5)
CloneCensorWeight Design with Single Model
While the previously described analysis seems to work fine, it is limited by the fact that treatment must occur at a single point in time. The clonecensorweight design relaxes this restriction by allowing for sustained treatments or dynamic treatment regimens. This is accomplished by a 3step process:
 ‘Clone’ each patient once for each treatment regimen of interest.
 ‘Censor’ each clone when their persontime is no longer consistent with the corresponding treatment regimen.
 ‘Weight’ the remaining persontime by the inverse probability of being censored.
This approach is quite general and can easily accommodate simple study designs like the one previously undertaken here. One complicating factor, however, is the need to handle baseline as well as timevarying treatment. The 3step process does not seem to have any way of dealing with baseline treatment, for instance using inverse probability of treatment weights. Doing so would require, within each set of ‘clones’, further dividing the clones by baseline treatment and applying both treatment and censoring weights.
It turns out that a single Cox proportional hazards model can be used to handle baseline and timevarying treatments. This is accomplished by ensuring that all patients contribute at least some persontime (so patients who are on the ‘wrong’ treatment at baseline are given some tiny amount of persontime) and specifying the censoring model flexibly enough to act as if it were in fact two separate models  one for treatment and one for censoring.
Here, we demonstrate how this works.
dat2_treat < dat %>%
group_by(id) %>%
mutate(C2 = ifelse(A == 0, runif(n, min = 1e8, max = 1e7), C)) %>%
slice(rep(1, 2)) %>%
mutate(t_ind = ifelse(row_number() == 1, 1, 0),
end = ifelse(row_number() == 1, 1e7, C2),
start = ifelse(row_number() == 1, 0, lag(end))) %>%
filter(start != end) %>%
mutate(treat = 1) %>%
mutate(del = ifelse(row_number() == n(), 1, 0)) %>%
mutate(end = ifelse(end > .5, .5, end),
del = ifelse(end >= .5, 0, del)) %>%
ungroup()
dat2_notreat < dat %>%
group_by(id) %>%
mutate(C2 = ifelse(A == 1, runif(n, min = 1e8, 1e7), C)) %>%
slice(rep(1, 2)) %>%
mutate(t_ind = ifelse(row_number() == 1, 1, 0),
end = ifelse(row_number() == 1, 1e7, C2),
start = ifelse(row_number() == 1, 0, lag(end))) %>%
filter(start != end) %>%
mutate(treat = 0) %>%
mutate(del = ifelse(row_number() == n(), 1, 0)) %>%
mutate(end = ifelse(end > .5, .5, end),
del = ifelse(end >= .5, 0, del)) %>%
ungroup()
dat2 < bind_rows(dat2_treat, dat2_notreat)
mod_ccw < specify_models(identify_treatment(treat),
identify_outcome(T),
identify_censoring(C2, ~W + W:t_ind),
identify_interval(start, end),
identify_subject(id))
fit_ccw < estimate_ipwrisk(dat2, mod_ccw, times = seq(0, 0.5, by = 0.01), labels = "Adjusted, CloneCensorWeight")
plot(fit_ccw, fit_adj)
make_table2(fit_ccw, fit_adj, risk_time = 0.5)
Conclusion
As you can see, besides a bit of numerical noise, the results from the two approaches are essentially the same! From this simple example, we can see how the clonecensorweight design may be able to provide a general framework for the types of studies typically encountered in epidemiology.
References
About Target RWE
Target RWE generates realworld evidence (RWE) that informs strategic decisions across the drug development lifecycle. Our unique combination of clinical, analytical and technical expertise enables comprehensive insight generation from complete retrospective and prospective longitudinal patient journeys, with unparalleled scale and accuracy.
Visit our website to learn more: https://targetrwe.com/
Contact:
Kayla Slake
Senior Manager, Marketing
984.234.0268 ext 205
More News

10/09/2024
Advancing RWE: Target RWE's Staging and Clean Room Committee Approaches Set New Benchmark 
08/27/2024
ISPE 2024 Research: Presentation by Kathleen Hurwitz ScD, Catherine Wiener, and Chase Latour 
08/21/2024
Target RWE Announces the Launch of engage™  An Innovative, Digital PatientCentric Engagement Platform 
08/21/2024
Sequential Nested Trial Emulation Presented by RealWorld Evidence Leader Target RWE at ISPE 2024 
06/12/2024
Real World TARGETNASH Analysis Shows High Concordance Between New MASLD Definitions and Prior TARGETNASH Pragmatic Definitions