% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cv.sail.R
\name{cv.sail}
\alias{cv.sail}
\title{Cross-validation for sail}
\usage{
cv.sail(x, y, e, ..., weights, lambda = NULL, type.measure = c("mse",
  "deviance", "class", "auc", "mae"), nfolds = 10, foldid,
  grouped = TRUE, keep = FALSE, parallel = FALSE)
}
\arguments{
\item{x}{input matrix of dimension \code{n x p}, where \code{n} is the number
of subjects and p is number of X variables. Each row is an observation
vector. Can be a high-dimensional (n < p) matrix. Can be a user defined
design matrix of main effects only (without intercept) if
\code{expand=FALSE}}

\item{y}{response variable. For \code{family="gaussian"} should be a 1 column
matrix or numeric vector. For \code{family="binomial"}, should be a 1
column matrix or numeric vector with -1 for failure and 1 for success.}

\item{e}{exposure or environment vector. Must be a numeric vector. Factors
must be converted to numeric.}

\item{...}{other arguments that can be passed to \code{\link{sail}}}

\item{weights}{observation weights. Default is 1 for each observation.
Currently NOT IMPLEMENTED.}

\item{lambda}{Optional user-supplied lambda sequence; default is NULL, and
\code{\link{sail}} chooses its own sequence}

\item{type.measure}{loss to use for cross-validation. Currently only 3
options are implemented. The default is \code{type.measure="deviance"},
which uses squared-error for gaussian models (and is equivalent to
\code{type.measure="mse"}) there). \code{type.measure="mae"} (mean absolute
error) can also be used which measures the absolute deviation from the
fitted mean to the response (\eqn{|y-\hat{y}|}).}

\item{nfolds}{number of folds. Although \code{nfolds} can be as large as the
sample size (leave-one-out CV), it is not recommended for large datasets.
Smallest value allowable is \code{nfolds=3}. Default: 10}

\item{foldid}{an optional vector of values between 1 and \code{nfold}
identifying what fold each observation is in. If supplied,\code{nfold} can
be missing. Often used when wanting to tune the second tuning parameter
(\eqn{\alpha}) as well (see details).}

\item{grouped}{This is an experimental argument, with default \code{TRUE},
and can be ignored by most users. This refers to computing \code{nfolds}
separate statistics, and then using their mean and estimated standard error
to describe the CV curve. If \code{grouped=FALSE}, an error matrix is built
up at the observation level from the predictions from the \code{nfold}
fits, and then summarized (does not apply to \code{type.measure="auc"}).
Default: TRUE.}

\item{keep}{If \code{keep=TRUE}, a \emph{prevalidated} array is returned
containing fitted values for each observation and each value of
\code{lambda}. This means these fits are computed with this observation and
the rest of its fold omitted. The \code{folid} vector is also returned.
Default: FALSE}

\item{parallel}{If \code{TRUE}, use parallel \code{foreach} to fit each fold.
Must register parallel before hand using the
\code{\link[doParallel]{registerDoParallel}} function from the \code{doParallel} package. See
the example below for details. Default: FALSE}
}
\value{
an object of class \code{"cv.sail"} is returned, which is a list with
  the ingredients of the cross-validation fit. \describe{ \item{lambda}{the
  values of converged \code{lambda} used in the fits.} \item{cvm}{The mean
  cross-validated error - a vector of length \code{length(lambda)}.}
  \item{cvsd}{estimate of standard error of \code{cvm}.} \item{cvup}{upper
  curve = \code{cvm+cvsd}.} \item{cvlo}{lower curve = \code{cvm-cvsd}.}
  \item{nzero}{number of non-zero coefficients at each \code{lambda}. This is
  the sum of the total non-zero main effects and interactions. Note that when
  \code{expand=TRUE}, we only count a variable once in the calculation of
  \code{nzero}, i.e., if a variable is expanded to three columns, then this
  is only counted once even though all three coefficients are estimated to be
  non-zero} \item{name}{a text string indicating type of measure (for
  plotting purposes).} \item{sail.fit}{a fitted \code{sail} object for the full
  data.} \item{lambda.min}{value of \code{lambda} that gives minimum
  \code{cvm}.} \item{lambda.1se}{largest value of \code{lambda} such that
  error is within 1 standard error of the minimum.} \item{fit.preval}{if
  \code{keep=TRUE}, this is the array of prevalidated fits. Some entries can
  be \code{NA}, if that and subsequent values of \code{lambda} are not
  reached for that fold} \item{foldid}{if \code{keep=TRUE}, the fold
  assignments used}}
}
\description{
Does k-fold cross-validation for sail and determines the optimal
  tuning parameter \eqn{\lambda}.
}
\details{
The function runs \code{\link{sail}} \code{nfolds}+1 times; the
  first to get the \code{lambda} sequence, and then the remainder to compute
  the fit with each of the folds omitted. Note that a new lambda sequence is
  computed for each of the folds and then we use the \code{predict} method to
  get the solution path at each value of the original lambda sequence. The
  error is accumulated, and the average error and standard deviation over the
  folds is computed. Note that \code{cv.sail} does NOT search for values for
  \code{alpha}. A specific value should be supplied, else \code{alpha=0.5} is
  assumed by default. If users would like to cross-validate \code{alpha} as
  well, they should call \code{cv.sail} with a pre-computed vector
  \code{foldid}, and then use this same fold vector in separate calls to
  \code{cv.sail} with different values of \code{alpha}. Note also that the
  results of \code{cv.sail} are random, since the folds are selected at
  random. Users can reduce this randomness by running \code{cv.sail} many
  times, and averaging the error curves.
}
\note{
The skeleton of this function and the documentation were taken straight
  from the \code{glmnet} package. See references for details.
}
\examples{
f.basis <- function(i) splines::bs(i, degree = 3)
data("sailsim")
# Parallel
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
cvfit <- cv.sail(x = sailsim$x, y = sailsim$y, e = sailsim$e,
                 parallel = TRUE, nlambda = 10,
                 maxit = 25, basis = f.basis,
                 nfolds = 3, dfmax = 5)
stopCluster(cl)
# plot cross validated curve
plot(cvfit)
# solution at lambda.min
coef(cvfit, s = "lambda.min")
# solution at lambda.1se
coef(cvfit, s = "lambda.1se")
# non-zero coefficients at lambda.min
predict(cvfit, s = "lambda.min", type = "nonzero")


}
\references{
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010).
  Regularization Paths for Generalized Linear Models via Coordinate Descent.
  Journal of Statistical Software, 33(1), 1-22.
  \url{http://www.jstatsoft.org/v33/i01/}.

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction
  models with the strong heredity property (2018+). Preprint.
}
\seealso{
\code{\link[splines]{bs}} \code{\link{sail}}
}
