Title: | Survival Analysis of Interval-Censored Data |
---|---|
Description: | Survival analysis of interval-censored data with proportional hazards, and an explicit smooth estimate of the baseline log-hazard with P-splines. |
Authors: | Hein Putter [aut, cre], Paul Eilers [aut] |
Maintainer: | Hein Putter <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.0 |
Built: | 2024-11-19 04:26:17 UTC |
Source: | https://github.com/cran/icpack |
Compute a B-spline basis
bbase(x, xl = min(x), xr = max(x), nseg = 10, deg = 3)
bbase(x, xl = min(x), xr = max(x), nseg = 10, deg = 3)
x |
The vector of values for which the basis is to be evaluated |
xl |
The left boundary of the domain |
xr |
The right boundary of the domain |
nseg |
The number of inter-knot segments on the domain |
deg |
The degree of the B-splines (2 means quadratic, 3 means cubic, and so on) |
A matrix containing the basis
x = runif(100) B = bbase(x, 0, 1, 20, 3)
x = runif(100) B = bbase(x, 0, 1, 20, 3)
Interval-censored drug users data
data(drugusers)
data(drugusers)
Data from a cohort of 940 injecting drug users attending a hospital detoxification unit in Barcelona, Spain. Time is months between initiation of intravenous drug use and HIV seroconversion. A dataframe with five columns:
left
Last negative HIV test (0 if first HIV test was positive)
right
First positive HIV test (Inf
if last HIV test was negative)
period
Period of initiation of drug use, factor with levels "1972-1980", "1981-1985", "1986-1991", "1992-1997"
gender
Gender, factor with levels "male" and "female"
age
Age at initiation of drug use (years)
Gomez G, Calle ML, Egea JM & Muga R (2000). Risk of HIV infection as a function of the duration of intravenous drug use: a non-parametric Bayesian approach. Stat Med; 19:2641–2656.
Perform the E-step in the EM algorithm
Estep(H, Ic, R1, dead)
Estep(H, Ic, R1, dead)
H |
Hazards per individual (in columns) |
Ic |
Censoring interval per individual, coded as 0/1 (in columns) |
R1 |
Left truncation interval per individual, coded as 0/1 (in columns) |
dead |
Boolean vector (TRUE is event, FALSE is right censored) |
A list with two matrices
Y |
Expected probability of event per bin per subject |
R |
Expected probability of at risk per bin per subject |
Taken from mstate
fillplot(x, y1, y2, col)
fillplot(x, y1, y2, col)
x |
Points on the x-axis |
y1 |
First set of points on y-axis |
y2 |
Second set of points on y-axis |
col |
The color to fill space with |
Nothing
Fit proportional hazard model with smooth baseline hazard and (optional) interval censoring
fitit( Y, R, dead, X, B, Ic, R1, cbx, Pdiff, Pridge, lambda, nit = 50, tol = 1e-06, tollam = 0.01, update_lambda = FALSE, ic_update = TRUE, monitor = FALSE )
fitit( Y, R, dead, X, B, Ic, R1, cbx, Pdiff, Pridge, lambda, nit = 50, tol = 1e-06, tollam = 0.01, update_lambda = FALSE, ic_update = TRUE, monitor = FALSE )
Y |
Events (matrix, number of bins by subjects) |
R |
Risk sets (matrix, number of bins by subjects) |
dead |
(Boolean vector, TRUE if event, FALSE if right censored) |
X |
Covariates (matrix, number of covariates (+1) by subjects) |
B |
B-spline basis matrix |
Ic |
Censoring interval per individual, coded as 0/1 (in columns) |
R1 |
Left truncation interval per individual, coded as 0/1 (in columns) |
cbx |
Vector of starting values |
Pdiff |
B-spline part of penalty matrix |
Pridge |
Ridge part of penalty matrix (for intercept) |
lambda |
Smoothing parameter (number) |
nit |
Maximum number of iterations (integer) |
tol |
Tolerance for final fit |
tollam |
Tolerance for switching to lambda update |
update_lambda |
Automatic update of lambda (Boolean) |
ic_update |
Update risk and event probabilities (Boolean) |
monitor |
Monitor convergence (Boolean) |
A list with items
cbx |
Vector of |
ll |
Poisson GLM log-likelihood |
lambda |
Final tuning parameter |
pen |
Penalty part of penalized log-likelihood |
ed |
Effetive dimension of the baseline hazard |
nit1 |
Number of iterations used in first phase |
nit |
Total number of iterations used (first plus second phase) |
tollam |
Tolerance used for switching to lambda update |
Get and check input of icfit
get_input_icfit(formula, data, entry)
get_input_icfit(formula, data, entry)
formula |
A formula object with response of the left of a ~ operator and terms on the right. The response must be a survival object as returned by the ‘Surv' function, with type either right’, 'counting' or 'interval2' |
data |
A data frame in which to interpret the variable names in the 'formula' |
entry |
When appropriate, a vector of entry (left truncation) times, or a string indicating the column name in 'data' containing entry times; only used if Surv object is of type 'interval2' |
A list with items
Ymat |
Matrix (number of subjects x 3) containing entry, left and right hand of intervals |
X |
Matrix (number of subjects x number of covariates + 1) with design matrix of covariates |
Fit a proportional hazards model with baseline hazard modeled by P-splines
icfit( formula, data, entry, lambda = 10, nt = 100, tmax, nseg = 20, bdeg = 3, pord = 2, nit = 50, tol = 1e-06, tollam = 0.01, kappa = 1e-06, update_lambda = TRUE, ic_update = TRUE, monitor = FALSE )
icfit( formula, data, entry, lambda = 10, nt = 100, tmax, nseg = 20, bdeg = 3, pord = 2, nit = 50, tol = 1e-06, tollam = 0.01, kappa = 1e-06, update_lambda = TRUE, ic_update = TRUE, monitor = FALSE )
formula |
A formula object with response of the left of a ~ operator and covariate terms on the right. The response must be a survival object as returned by the ‘Surv' function, with type either right’, 'counting' or 'interval2' |
data |
A data frame in which to interpret the variable names in the 'formula' |
entry |
When appropriate, a vector of entry (left truncation) times, or a string indicating the column name in 'data' containing entry times; only used if Surv object is of type 'interval2' |
lambda |
Starting value of penalty tuning parameter |
nt |
The number of time bins |
tmax |
The end of time domain (default 1.01 times largest observation) |
nseg |
The number of B-spline segments |
bdeg |
The degree of the B-splines |
pord |
The order of the differences used in the penalty |
nit |
Maximum number of iterations (integer) |
tol |
Tolerance for final fit |
tollam |
Tolerance for switching to lambda update |
kappa |
Ridge parameter (number) |
update_lambda |
Automatic update of lambda (Boolean) |
ic_update |
Update risk and event probabilities (Boolean) |
monitor |
Monitor convergence (Boolean) |
An object of class 'icfit'
# Fit proportional hazards model to interval-censored data icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) # Fit proportional hazards model to right-censored data icfit(Surv(time, d) ~ Diameter + FIGO + Karnofsky, data = Ova)
# Fit proportional hazards model to interval-censored data icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) # Fit proportional hazards model to right-censored data icfit(Surv(time, d) ~ Diameter + FIGO + Karnofsky, data = Ova)
Compute the information matrix
InfoMatrix(object, initres)
InfoMatrix(object, initres)
object |
Fit obtained from |
initres |
Result from |
Three information matrices are computed. One is Ifull
which interprets
the imputed R
and Y
data from object
as actual observations.
Iloss
gives the loss of information due to imputation. The sum of both matrices
is the true information matrix.
A list with three items
Itrue |
Total of |
Ifull |
Full Fisher information matrix |
Iloss |
Loss of information due to intervals (missing event times) |
Generate a discrete IC object
init(Times, X, nt, tmax, nseg = 20, bdeg = 3, pord = 2, kappa = 1e-06)
init(Times, X, nt, tmax, nseg = 20, bdeg = 3, pord = 2, kappa = 1e-06)
Times |
The (possibly interval censored) survival data, in a matrix |
X |
The design matrix containing covariates |
nt |
The number of bins for discretization |
tmax |
The end of time domain (default 1.01 times largest observation) |
nseg |
The number of B-spline segments |
bdeg |
The degree of the B-splines |
pord |
The order of the differences used in the penalty |
kappa |
Ridge parameter (number) |
A list with items
data |
List containing the original data as well as the binned data |
bins |
List with information on bins used |
basis |
List containing the B-spline matrix |
start |
List containing information on starting values |
penalty |
List containing Pdiff and Pridge |
control |
List with information on control of B-spline basis |
Function for fitting proportioal hazard model with baseline hazard
Mstep(Y, R, X, B, Pen, lambda, cbx)
Mstep(Y, R, X, B, Pen, lambda, cbx)
Y |
Expected events (matrix) |
R |
Expected risk sets (matrix) |
X |
Covariates (matrix) |
B |
B-spline basis |
Pen |
Penalty matrix |
lambda |
Smoothing parameter (number) |
cbx |
Current coefficient estimates |
An object with fields:
H
= hazards (matrix),
cbx
= coefficient estimates (vector),
lambda
= proposal for new lambda,
ed
= effective dimension,
G
= G matrix,
ll
= log-likelihood,
pen
= penalized part of log-likelihood,
Mpen
= penalized M matrix
Ovarian cancer data
data(Ova)
data(Ova)
A dataframe with five columns:
Diameter
FIGO
Karnofsky
time
d
death
tba
Plot method for an object of class 'icfit'
## S3 method for class 'icfit' plot( x, type = c("hazard", "cumhazard", "survival", "probability"), conf.int = TRUE, ylim = NULL, title = NULL, xlab = NULL, ylab = NULL, fill = TRUE, fillcol = "lightgrey", ... )
## S3 method for class 'icfit' plot( x, type = c("hazard", "cumhazard", "survival", "probability"), conf.int = TRUE, ylim = NULL, title = NULL, xlab = NULL, ylab = NULL, fill = TRUE, fillcol = "lightgrey", ... )
x |
The object of class 'icfit' to be plotted |
type |
Type of plot. Accepted choices: 'hazard' (default), 'cumhazard', 'survival' or 'cumprob' |
conf.int |
If 'TRUE' a 100*(1 - alpha) percent confidence interval is plotted |
ylim |
The y-limits for the plot |
title |
Optional title string |
xlab |
Text for x-label |
ylab |
Text for y-label |
fill |
Fill area between lower and upper |
fillcol |
The color for filling (default 'lightgrey') |
... |
Other arguments to plot (except ‘type', which is set to ’l') |
A ggplot grob, containing the plot. Use print()
or plot()
to show it
Multiple objects can be combined by using functions in the package gridExtra
.
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data = drugusers) plot(icf)
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data = drugusers) plot(icf)
Plot method for an object of class 'predict.icfit'
## S3 method for class 'predict.icfit' plot( x, type = c("hazard", "cumhazard", "survival", "probability"), conf.int = TRUE, fill = TRUE, fillcol = "lightgrey", ylim = NULL, title = NULL, xlab = NULL, ylab = NULL, selection = NULL, nrow = NULL, ncol = NULL, do_plot = TRUE, ... )
## S3 method for class 'predict.icfit' plot( x, type = c("hazard", "cumhazard", "survival", "probability"), conf.int = TRUE, fill = TRUE, fillcol = "lightgrey", ylim = NULL, title = NULL, xlab = NULL, ylab = NULL, selection = NULL, nrow = NULL, ncol = NULL, do_plot = TRUE, ... )
x |
The object of class 'predict.icfit' to be plotted |
type |
Type of plot. Accepted choices: 'hazard' (default), 'cumhazard', 'survival' or 'probability' |
conf.int |
If 'TRUE' a 100*(1 - alpha) percent confidence interval is plotted |
fill |
Fill area between lower and upper |
fillcol |
The color for filling (default 'lightgrey') |
ylim |
The y-limits for the plot |
title |
Optional title string, or, if x is a list, obtained from 'predict.icfit' using 'newdata', a vector of title strings |
xlab |
Text for x-label |
ylab |
Text for y-label |
selection |
If x is a list, obtained from 'predict.icfit' using 'newdata', then a vector containing the subset of list elements to be plotted, default is to plot all elements of the list |
nrow |
If x is a list, obtained from 'predict.icfit' using 'newdata', then a number specifying the number of rows to plot; default the square root of the number of list elements to be plotted |
ncol |
If x is a list, obtained from 'predict.icfit' using 'newdata', then a number specifying the number of columns to plot; default the square root of the number of list elements to be plotted |
do_plot |
Boolean indicating whether or not to actually plot (default is TRUE) |
... |
other graphical parameters to be passed on |
A ggplot grob, containing the plot. Use print()
or plot()
to show it
Multiple objects can be combined by using functions in the package gridExtra
.
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) pred_icf <- predict(icf) plot(pred_icf) library(ggplot2) plot(icf) + xlim(0, 200) + ylim(0, 0.05) ndata <- drugusers[1:4, ] pred_nd_icf <- predict(icf, newdata=ndata) plot(pred_nd_icf) # plot all four plot(pred_nd_icf[[2]]) # plot only the second plot(pred_nd_icf, type = "cumhazard") # plot four cumulative hazard curves plot(pred_nd_icf[[3]], type = "prob", ylim = c(0, 1)) # plot probability curve for nr 3 plot(pred_nd_icf[[4]], type = "surv", ylim = c(0, 1)) # plot survival curve for nr 4
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) pred_icf <- predict(icf) plot(pred_icf) library(ggplot2) plot(icf) + xlim(0, 200) + ylim(0, 0.05) ndata <- drugusers[1:4, ] pred_nd_icf <- predict(icf, newdata=ndata) plot(pred_nd_icf) # plot all four plot(pred_nd_icf[[2]]) # plot only the second plot(pred_nd_icf, type = "cumhazard") # plot four cumulative hazard curves plot(pred_nd_icf[[3]], type = "prob", ylim = c(0, 1)) # plot probability curve for nr 3 plot(pred_nd_icf[[4]], type = "surv", ylim = c(0, 1)) # plot survival curve for nr 4
Predict method for an object of class 'icfit'
## S3 method for class 'icfit' predict(object, newdata, nstep = 500, alpha = 0.05, ...)
## S3 method for class 'icfit' predict(object, newdata, nstep = 500, alpha = 0.05, ...)
object |
The object of class 'icfit' for which a prediction is to be made |
newdata |
A data frame containing covariate information for a new subject |
nstep |
Number of time steps used for calculating cumulative hazards (default is 500) |
alpha |
The alpha level for the (1-alpha)*100 percent confidence interval |
... |
Any other arguments |
An object of class 'predict.icfit', which is a data frame with time points and hazard, cumulative hazard and survival at those time points, along with standard errors and pointwise lower and upper confidence bounds, or a list of such data frames for each subject represented in 'newdata'
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) pred_icf <- predict(icf) head(pred_icf) ndata <- drugusers[1:4, ] pred_nd_icf <- predict(icf, newdata=ndata) lapply(pred_nd_icf, head)
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) pred_icf <- predict(icf) head(pred_icf) ndata <- drugusers[1:4, ] pred_nd_icf <- predict(icf, newdata=ndata) lapply(pred_nd_icf, head)
Print method for an object of class 'icfit'
## S3 method for class 'icfit' print(x, digits = max(1L, getOption("digits") - 3L), alpha = 0.05, ...)
## S3 method for class 'icfit' print(x, digits = max(1L, getOption("digits") - 3L), alpha = 0.05, ...)
x |
The object of class 'icfit' to be printed |
digits |
Number of digits to be printed |
alpha |
Alpha level to be used of confidence interval ((1-alpha) * 100 percent) |
... |
Further arguments to print |
No return value
Plot probabilities as a raster'
rasterplot( icf, type = c("both", "R", "Y"), sel = NULL, label = NULL, show_label = FALSE, pow = 0.2, order = TRUE, do_plot = TRUE )
rasterplot( icf, type = c("both", "R", "Y"), sel = NULL, label = NULL, show_label = FALSE, pow = 0.2, order = TRUE, do_plot = TRUE )
icf |
an object of class 'icfit' |
type |
a string giving the type of the plot. Accepted choices: 'R' (risk probabilities) and 'Y' (event probabilities) |
sel |
a vector of integers for selection of subject (rows of the matrix) |
label |
character vector containing labels for the individuals to be plotted in selection |
show_label |
Boolean, whether or not to show the labels |
pow |
a number, giving he power to which the probabilities will be raised, to improve the clarity of the plot |
order |
Boolean, default (TRUE) is to order according to first positive in Y, then first zero in Y, then first zero in R; if FALSE order of occurrence in data is used |
do_plot |
Boolean, default (TRUE) shows the plot, if FALSE object is returned but not plotted |
a ggplot object (Grob)
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) rasterplot(icf) rasterplot(icf, type = 'R') rasterplot(icf, type = 'Y') rasterplot(icf, pow = 0.05) # very small power basically shows 0/1 sel <- c( 11, 18, # right-censored, event in (L, \infty) 1:2, # event in (0, R) 115, 133 # event in (L, R) ) rasterplot(icf, sel = sel) rasterplot(icf, sel = sel, label = c("e", "p", "g", "c", "m", "n"), show_label = TRUE) rasterplot(icf, sel = sel, label = c("e", "p", "g", "c", "m", "n"), show_label = TRUE, type = 'Y')
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) rasterplot(icf) rasterplot(icf, type = 'R') rasterplot(icf, type = 'Y') rasterplot(icf, pow = 0.05) # very small power basically shows 0/1 sel <- c( 11, 18, # right-censored, event in (L, \infty) 1:2, # event in (0, R) 115, 133 # event in (L, R) ) rasterplot(icf, sel = sel) rasterplot(icf, sel = sel, label = c("e", "p", "g", "c", "m", "n"), show_label = TRUE) rasterplot(icf, sel = sel, label = c("e", "p", "g", "c", "m", "n"), show_label = TRUE, type = 'Y')
Summary method for an object of class 'icfit'
## S3 method for class 'icfit' summary( object, lvl = 1, digits = max(1L, getOption("digits") - 3L), alpha = 0.05, ... )
## S3 method for class 'icfit' summary( object, lvl = 1, digits = max(1L, getOption("digits") - 3L), alpha = 0.05, ... )
object |
Object of class 'icfit' |
lvl |
Describes the level of output |
digits |
Number of digits to be printed |
alpha |
Alpha level to be used of confidence interval ((1-alpha) * 100 percent) |
... |
Other arguments to summary |
None (invisible NULL
)
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) summary(icf) summary(icf, lvl=0) # same as print(icf) summary(icf, lvl=1) # extra information on iterations and computation time
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) summary(icf) summary(icf, lvl=0) # same as print(icf) summary(icf, lvl=1) # extra information on iterations and computation time
Summary method for an object of class 'predict.icfit'
## S3 method for class 'predict.icfit' summary(object, times, ...)
## S3 method for class 'predict.icfit' summary(object, times, ...)
object |
Object of class 'predict.icfit' |
times |
The time points at which to summarize the predicted hazards, cumulative hazards and survival probabilities, with associated standard errors and confidence intervals |
... |
Other arguments to plot |
A data frame (if object was a data frame) or a list of data frames (if object was a list of data frames) with hazards etc linearly interpolated between the time points used in the predict function
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) pred_icf <- predict(icf) summary(pred_icf, times=c(0, 30, 183, 365))
icf <- icfit(Surv(left, right, type='interval2') ~ period + gender + age, data=drugusers) pred_icf <- predict(icf) summary(pred_icf, times=c(0, 30, 183, 365))