% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/logit.spls.stab.R
\name{logit.spls.stab}
\alias{logit.spls.stab}
\title{Stability selection procedure to estimate probabilities of selection of 
covariates for the LOGIT-SPLS method}
\usage{
logit.spls.stab(
  X,
  Y,
  lambda.ridge.range,
  lambda.l1.range,
  ncomp.range,
  adapt = TRUE,
  maxIter = 100,
  svd.decompose = TRUE,
  ncores = 1,
  nresamp = 100,
  center.X = TRUE,
  scale.X = FALSE,
  weighted.center = TRUE,
  seed = NULL,
  verbose = TRUE
)
}
\arguments{
\item{X}{a (n x p) data matrix of predictors. \code{X} must be a matrix. 
Each row corresponds to an observation and each column to a 
predictor variable.}

\item{Y}{a (n) vector of (continuous) responses. \code{Y} must be a 
vector or a one column matrix. It contains the response variable for 
each observation. \code{Y} should take values in \{0,1\}.}

\item{lambda.ridge.range}{a vector of positive real values. 
\code{lambda.ridge} is the Ridge regularization parameter for the 
RIRLS algorithm (see details), the optimal value will be chosen among
\code{lambda.ridge.range}.}

\item{lambda.l1.range}{a vecor of positive real values, in [0,1]. 
\code{lambda.l1} is the sparse penalty parameter for the dimension 
reduction step by sparse PLS (see details), the optimal value will be 
chosen among \code{lambda.l1.range}.}

\item{ncomp.range}{a vector of positive integers. \code{ncomp} is the 
number of PLS components. The optimal value will be chosen 
among \code{ncomp.range}.}

\item{adapt}{a boolean value, indicating whether the sparse PLS selection 
step sould be adaptive or not (see details).}

\item{maxIter}{a positive integer, the maximal number of iterations in the 
RIRLS algorithm (see details).}

\item{svd.decompose}{a boolean parameter. \code{svd.decompose} indicates 
wether or not the predictor matrix \code{Xtrain} should be decomposed by 
SVD (singular values decomposition) for the RIRLS step (see details).}

\item{ncores}{a positve integer, indicating the number of cores that the 
cross-validation is allowed to use for parallel computation (see details).}

\item{nresamp}{number of resamplings of the data to estimate the probility 
of selection for each covariate, default is 100.}

\item{center.X}{a boolean value indicating whether the data matrices 
\code{Xtrain} and \code{Xtest} (if provided) should be centered or not.}

\item{scale.X}{a boolean value indicating whether the data matrices 
\code{Xtrain} and \code{Xtest} (if provided) should be scaled or not 
(\code{scale.X=TRUE} implies \code{center.X=TRUE}) in the spls step.}

\item{weighted.center}{a boolean value indicating whether the centering 
should take into account the weighted l2 metric or not in the SPLS step.}

\item{seed}{a positive integer value (default is NULL). If non NULL, 
the seed for pseudo-random number generation is set accordingly.}

\item{verbose}{a boolean parameter indicating the verbosity.}
}
\value{
An object with the following attributes
\item{q.Lambda}{A table with values of q.Lambda (c.f. Durif 
et al. (2018) for the notation), being the averaged number of covariates
selected among the entire grid of hyper-parameters candidates values,
for increasing size of hyper-parameter grid.}
\item{probs.lambda}{A table with estimated probability of selection for each 
covariates depending on the candidates values for hyper-parameters.}
\item{p}{An integer values indicating the number of covariates in the 
model.}
}
\description{
The function \code{logit.spls.stab} train a logit-spls model for each 
candidate values \code{(ncomp, lambda.l1, lambda.ridge)} of hyper-parameters 
on multiple sub-samplings in the data. The stability selection procedure 
selects the covariates that are selected by most of the models among the 
grid of hyper-parameters, following the procedure described in 
Durif et al. (2018). Candidates values for \code{ncomp}, \code{lambda.l1} 
and \code{lambda.l2} are respectively given by 
the input arguments \code{ncomp.range}, \code{lambda.l1.range} 
and \code{lambda.l2.range}.
}
\details{
The columns of the data matrices \code{X} may not be standardized, 
since standardizing is performed by the function \code{logit.spls.stab} 
as a preliminary step. 

The procedure is described in Durif et al. (2018). The stability selection 
procedure can be summarize as follow (c.f. Meinshausen and Buhlmann, 2010).

(i) For each candidate values \code{(ncomp, lambda.l1, lambda.ridge)} of 
hyper-parameters, a logit-SPLS is trained on \code{nresamp} resamplings 
of the data. Then, for each triplet \code{(ncomp, lambda.l1, lambda.ridge)}, 
the probability that a covariate (i.e. a column in \code{X}) is selected is 
computed among the resamplings.

(ii) Eventually, the set of "stable selected" variables corresponds to the 
set of covariates that were selected by most of the training among the 
grid of hyper-parameters candidate values.

This function achieves the first step (i) of the stability selection 
procedure. The second step (ii) is achieved by the function 
\code{\link{stability.selection}}.

This procedures uses \code{mclapply} from the \code{parallel} package, 
available on GNU/Linux and MacOS. Users of Microsoft Windows can refer to 
the README file in the source to be able to use a mclapply type function.
}
\examples{
\dontrun{
### load plsgenomics library
library(plsgenomics)

### generating data
n <- 100
p <- 100
sample1 <- sample.bin(n=n, p=p, kstar=10, lstar=2, 
                      beta.min=0.25, beta.max=0.75, mean.H=0.2, 
                      sigma.H=10, sigma.F=5)

X <- sample1$X
Y <- sample1$Y

### pertinent covariates id
sample1$sel

### hyper-parameters values to test
lambda.l1.range <- seq(0.05,0.95,by=0.1) # between 0 and 1
ncomp.range <- 1:10
# log-linear range between 0.01 a,d 1000 for lambda.ridge.range
logspace <- function( d1, d2, n) exp(log(10)*seq(d1, d2, length.out=n))
lambda.ridge.range <- signif(logspace(d1 <- -2, d2 <- 3, n=21), digits=3)

### tuning the hyper-parameters
stab1 <- logit.spls.stab(X=X, Y=Y, lambda.ridge.range=lambda.ridge.range, 
                         lambda.l1.range=lambda.l1.range, 
                         ncomp.range=ncomp.range, 
                         adapt=TRUE, maxIter=100, svd.decompose=TRUE, 
                         ncores=1, nresamp=100)
                       
str(stab1)

### heatmap of estimated probabilities
stability.selection.heatmap(stab1)

### selected covariates
stability.selection(stab1, piThreshold=0.6, rhoError=10)
}

}
\references{
Durif, G., Modolo, L., Michaelsson, J., Mold, J.E., Lambert-Lacroix, S., 
Picard, F., 2018. High dimensional classification with combined 
adaptive sparse PLS and logistic regression. Bioinformatics 34, 
485--493. \doi{10.1093/bioinformatics/btx571}.
Available at \url{http://arxiv.org/abs/1502.05933}.

Meinshausen, N., Buhlmann P. (2010). Stability Selection. Journal of the 
Royal Statistical Society: Series B (Statistical Methodology) 
72, no. 4, 417-473.
}
\seealso{
\code{\link{logit.spls}}, \code{\link{stability.selection}}, 
\code{\link{stability.selection.heatmap}}
}
\author{
Ghislain Durif (\url{https://gdurif.perso.math.cnrs.fr/}).
}
