% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/emfrail.R
\name{emfrail}
\alias{emfrail}
\title{Fitting semi-parametric shared frailty models with the EM algorithm}
\usage{
emfrail(formula, data, distribution = emfrail_dist(),
  control = emfrail_control(), model = FALSE, model.matrix = FALSE, ...)
}
\arguments{
\item{formula}{A formula that contains on the left hand side an object of the type \code{Surv}
and on the right hand side a \code{+cluster(id)} statement. Two special statments may also be used:
\code{+strata()} for specifying a grouping column that will represent different strata and
\code{+terminal()}}

\item{data}{A \code{data.frame} in which the formula argument can be evaluated}

\item{distribution}{An object as created by \code{\link{emfrail_dist}}}

\item{control}{An object as created by \code{\link{emfrail_control}}}

\item{model}{Logical. Should the model frame be returned?}

\item{model.matrix}{Logical. Should the model matrix be returned?}

\item{...}{Other arguments, currently used to warn about deprecated argument names}
}
\value{
An object of class \code{emfrail} that contains the following fields:
\item{coefficients}{A named vector of the estimated regression coefficients}
\item{hazard}{The breslow estimate of the baseline hazard at each event time point, in chronological order}
\item{var}{The variance-covariance matrix corresponding to the coefficients and hazard, assuming \eqn{\theta} constant}
\item{var_adj}{The variance-covariance matrx corresponding to the
coefficients and hazard, adjusted for the estimation of theta}
\item{logtheta}{The logarithm of the point estimate of \eqn{\theta}. For the gamma and
PVF family of distributions, this is the inverse of the estimated frailty variance.}
\item{var_logtheta}{The variance of the estimated logarithm of \eqn{\theta}}
\item{ci_logtheta}{The likelihood-based 95\% confidence interval for the logarithm of \eqn{\theta}}
\item{frail}{The posterior (empirical Bayes) estimates of the frailty for each cluster}
\item{residuals}{A list with two elements, cluster which is a vector that the sum of the
cumulative hazards from each cluster for a frailty value of 1, and
individual, which is a vector that contains the cumulative hazard corresponding to each row of the data,
multiplied by the corresponding frailty estimate}
\item{tev}{The time points of the events in the data set, this is the same length as hazard}
\item{nevents_id}{The number of events for each cluster}
\item{loglik}{A vector of length two with the log-likelihood of the starting Cox model
and the maximized log-likelihood}
\item{ca_test}{The results of the Commenges-Andersen test for heterogeneity}
\item{cens_test}{The results of the test for dependence between a recurrent event and a terminal event,
if the \code{+terminal()} statement is specified and the frailty distribution is gamma}
\item{formula}{The original formula argument}
\item{distribution}{The original distribution argument}
\item{control}{The original control argument}
\item{mf}{The \code{model.frame}, if \code{model = TRUE}}
\item{mm}{The \code{model.matrix}, if \code{model.matrix = TRUE}}
}
\description{
Fitting semi-parametric shared frailty models with the EM algorithm
}
\details{
The \code{emfrail} function fits shared frailty models for processes which have intensity
\deqn{\lambda(t) = z \lambda_0(t) \exp(\beta' \mathbf{x})}
with a non-parametric (Breslow) baseline intensity \eqn{\lambda_0(t)}. The outcome
(left hand side of the \code{formula}) must be a \code{Surv} object.

If the object is \code{Surv(tstop, status)} then the usual failure time data is represented.
Gap-times between recurrent events are represented in the same way.
If the left hand side of the formula is created as \code{Surv(tstart, tstop, status)}, this may represent a number of things:
(a) recurrent events episodes in calendar time where a recurrent event episode starts at \code{tstart} and ends at \code{tstop}
(b) failure time data with time-dependent covariates where \code{tstop} is the time of a change in covariates or censoring
(\code{status = 0}) or an event time (\code{status = 1}) or (c) clustered failure time with left truncation, where
\code{tstart} is the individual's left truncation time. Unlike regular Cox models, a major distinction is that in case (c) the
distribution of the frailty must be considered conditional on survival up to the left truncation time.

The \code{+cluster()} statement specified the column that determines the grouping (the observations that share the same frailty).
The \code{+strata()} statement specifies a column that determines different strata, for which different baseline hazards are calculated.
The \code{+terminal} specifies a column that contains an indicator for dependent censoring, and then performs a score test

The \code{distribution} argument must be generated by a call to \code{\link{emfrail_dist}}. This determines the
frailty distribution, which may be one of gamma, positive stable or PVF (power-variance-function), and the starting
value for the maximum likelihood estimation. The PVF family
also includes a tuning parameter that differentiates between inverse Gaussian and compound Poisson distributions.

The \code{control} argument must be generated by a call to \code{\link{emfrail_control}}. Several parameters
may be adjusted that control the precision of the convergenge criteria or supress the calculation of different
quantities.
}
\note{
Several options in the \code{control} arguemnt shorten the running time for \code{emfrail} significantly.
These are disabling the adjustemnt of the standard errors (\code{se_adj = FALSE}), disabling the likelihood-based confidence intervals (\code{lik_ci = FALSE}) or
disabling the score test for heterogeneity (\code{ca_test = FALSE}).

The algorithm is detailed in the package vignette. For the gamma frailty,
the results should be identical with those from \code{coxph} with \code{ties = "breslow"}.
}
\examples{

m_gamma <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                   data =  rats)

# Inverse Gaussian distribution
m_ig <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                data =  rats,
                distribution = emfrail_dist(dist = "pvf"))

# for the PVF distribution with m = 0.75
m_pvf <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                 data =  rats,
                 distribution = emfrail_dist(dist = "pvf", pvfm = 0.75))

# for the positive stable distribution
m_ps <- emfrail(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                data =  rats,
                distribution = emfrail_dist(dist = "stable"))
\dontrun{
# Compare marginal log-likelihoods
models <- list(m_gamma, m_ig, m_pvf, m_ps)

models
logliks <- lapply(models, logLik)

names(logliks) <- lapply(models,
                         function(x) with(x$distribution,
                                          ifelse(dist == "pvf",
                                                 paste(dist, "/", pvfm),
                                                 dist))
)

logliks
}

# Stratified analysis
\dontrun{
  m_strat <- emfrail(formula = Surv(time, status) ~  rx + strata(sex) + cluster(litter),
                     data =  rats)
}


# Draw the profile log-likelihood
\dontrun{
  fr_var <- seq(from = 0.01, to = 1.4, length.out = 20)

  # For gamma the variance is 1/theta (see parametrizations)
  pll_gamma <- emfrail_pll(formula = Surv(time, status) ~  rx + sex + cluster(litter),
                           data =  rats,
                           values = 1/fr_var )
  plot(fr_var, pll_gamma,
       type = "l",
       xlab = "Frailty variance",
       ylab = "Profile log-likelihood")


  # Recurrent events
  mod_rec <- emfrail(Surv(start, stop, status) ~ treatment + cluster(id), bladder1)
  # The warnings appear from the Surv object, they also appear in coxph.

  plot(mod_rec, type = "hist")
}

# Left truncation
\dontrun{
  # We simulate some data with truncation times
  set.seed(2018)
  nclus <- 300
  nind <- 5
  x <- sample(c(0,1), nind * nclus, TRUE)
  u <- rep(rgamma(nclus,1,1), each = 3)

  stime <- rexp(nind * nclus, rate = u * exp(0.5 * x))

  status <- ifelse(stime > 5, 0, 1)
  stime[status == 0] <- 5

  # truncate uniform between 0 and 2
  ltime <- runif(nind * nclus, min = 0, max = 2)

  d <- data.frame(id = rep(1:nclus, each = nind),
                  x = x,
                  stime = stime,
                  u = u,
                  ltime = ltime,
                  status = status)
  d_left <- d[d$stime > d$ltime,]

  mod <- emfrail(Surv(stime, status)~ x + cluster(id), d)
  # This model ignores the left truncation, 0.378 frailty variance:
  mod_1 <- emfrail(Surv(stime, status)~ x + cluster(id), d_left)

  # This model takes left truncation into account,
 # but it considers the distribution of the frailty unconditional on the truncation
 mod_2 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left)

  # This is identical with:
  mod_cox <- coxph(Surv(ltime, stime, status)~ x + frailty(id), data = d_left)


  # The correct thing is to consider the distribution of the frailty given the truncation
  mod_3 <- emfrail(Surv(ltime, stime, status)~ x + cluster(id), d_left,
                   distribution = emfrail_dist(left_truncation = TRUE))

  summary(mod_1)
  summary(mod_2)
  summary(mod_3)
}
}
\seealso{
\code{\link{plot.emfrail}} and \code{\link{autoplot.emfrail}} for plot functions directly available, \code{\link{emfrail_pll}} for calculating \eqn{\widehat{L}(\theta)} at specific values of \eqn{\theta},
\code{\link{summary.emfrail}} for transforming the \code{emfrail} object into a more human-readable format and for
visualizing the frailty (empirical Bayes) estimates,
\code{\link{predict.emfrail}} for calculating and visalizing conditional and marginal survival and cumulative
hazard curves. \code{\link{residuals.emfrail}} for extracting martingale residuals and \code{\link{logLik.emfrail}} for extracting
the log-likelihood of the fitted model.
}
