% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ExperimentSetup.R
\name{extract_experimental_setup}
\alias{extract_experimental_setup}
\title{Parse experimental design}
\usage{
extract_experimental_setup(
  experimental_design,
  file_dir,
  message_indent = 0L,
  verbose = TRUE
)
}
\arguments{
\item{experimental_design}{(\strong{required}) Defines what the experiment looks
like, e.g. \code{cv(bt(fs,20)+mb,3,2)+ev} for 2 times repeated 3-fold
cross-validation with nested feature selection on 20 bootstraps and
model-building, and external validation. The basic workflow components are:
\itemize{
\item \code{fs}: (required) feature selection step.
\item \code{mb}: (required) model building step.
\item \code{ev}: (optional) external validation. Note that internal validation due
to subsampling will always be conducted if the subsampling methods create
any validation data sets.
}

The different components are linked using \code{+}.

Different subsampling methods can be used in conjunction with the basic
workflow components:
\itemize{
\item \code{bs(x,n)}: (stratified) .632 bootstrap, with \code{n} the number of
bootstraps. In contrast to \code{bt}, feature pre-processing parameters and
hyperparameter optimisation are conducted on individual bootstraps.
\item \code{bt(x,n)}: (stratified) .632 bootstrap, with \code{n} the number of
bootstraps. Unlike \code{bs} and other subsampling methods, no separate
pre-processing parameters or optimised hyperparameters will be determined
for each bootstrap.
\item \code{cv(x,n,p)}: (stratified) \code{n}-fold cross-validation, repeated \code{p} times.
Pre-processing parameters are determined for each iteration.
\item \code{lv(x)}: leave-one-out-cross-validation. Pre-processing parameters are
determined for each iteration.
\item \code{ip(x)}: imbalance partitioning for addressing class imbalances on the
data set. Pre-processing parameters are determined for each partition. The
number of partitions generated depends on the imbalance correction method
(see the \code{imbalance_correction_method} parameter). Imbalance partitioning
does not generate validation sets.
}

As shown in the example above, sampling algorithms can be nested.

The simplest valid experimental design is \code{fs+mb}, which corresponds to a
TRIPOD type 1a analysis. Type 1b analyses are only possible using
bootstraps, e.g. \code{bt(fs+mb,100)}. Type 2a analyses can be conducted using
cross-validation, e.g. \code{cv(bt(fs,100)+mb,10,1)}. Depending on the origin of
the external validation data, designs such as \code{fs+mb+ev} or
\code{cv(bt(fs,100)+mb,10,1)+ev} constitute type 2b or type 3 analyses. Type 4
analyses can be done by obtaining one or more \code{familiarModel} objects from
others and applying them to your own data set.

Alternatively, the \code{experiment_design} parameter may be used to provide a
path to a file containing iterations, which is named \verb{####_iterations.RDS}
by convention. This path can be relative to the directory of the current
experiment (\code{experiment_dir}), or an absolute path. The absolute path may
thus also point to a file from a different experiment.}

\item{message_indent}{Spacing inserted before messages.}

\item{verbose}{Sets verbosity.}
}
\value{
data.table with subsampler information at different levels of the
experimental design.
}
\description{
Parse experimental design
}
\details{
This function converts the experimental_design string
}
\keyword{internal}
