% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/main.R
\name{select_variable}
\alias{select_variable}
\title{Select variables for ensemble learning architecture}
\usage{
select_variable(
  train_num,
  test_num = NULL,
  train_batchID = NULL,
  test_batchID = NULL,
  selectVar_corType = c("cor", "pcor"),
  selectVar_corMethod = c("spearman", "pearson"),
  selectVar_minNum = 5,
  selectVar_maxNum = 10,
  selectVar_batchWise = FALSE,
  coerce_numeric = FALSE
)
}
\arguments{
\item{train_num}{a numeric data.frame \strong{only} including the metabolite values of training samples (can be quality control samples). Information such as injection order or well position need to be excluded. Row: sample. Column: metabolite variable. See Examples.}

\item{test_num}{an optional numeric data.frame including the metabolite values of test samples (can be subject samples). If provided, the column names of \code{test_num} should correspond to the column names of \code{train_num}. Row: sample. Column: metabolite variable. If \code{NULL}, the variables will be selected based on \code{train_num} only. See Examples.}

\item{train_batchID}{\code{NULL} or a vector corresponding to \code{train_num} to specify the batch of each sample. Ignored if \code{selectVar_batchWise = FALSE}. See Examples.}

\item{test_batchID}{\code{NULL} or a vector corresponding to \code{test_num} to specify the batch of each sample. Ignored if \code{selectVar_batchWise = FALSE}. See Examples.}

\item{selectVar_corType}{a character string indicating correlation (\code{"cor"}, default) or partial correlation (\code{"pcor"}) is to be used. Can be abbreviated. See Details. \strong{Note}: computing partial correlations of a large dataset can be very time-consuming.}

\item{selectVar_corMethod}{a character string indicating which correlation coefficient is to be computed. One of \code{"spearman"} (default) or \code{"pearson"}. Can be abbreviated. See Details.}

\item{selectVar_minNum}{an integer specifying the minimum number of the selected variables. If \code{NULL}, no limited, but 1 at least. See Details. Default: 5.}

\item{selectVar_maxNum}{an integer specifying the maximum number of the selected variables. If \code{NULL}, no limited, but \code{ncol(train_num) - 1} at most. See Details. Default: 10.}

\item{selectVar_batchWise}{(advanced) logical. Specify whether the variable selection should be performed based on each batch. Default: \code{FALSE}. \strong{Note}: if \code{TRUE}, batch ID of each sample are required. The support of batch-wise variable selection is provided for data requiring special processing (for example, data with strong batch effects). But in most case, batch-wise variable selection is not necessary. Setting \code{TRUE} might make the algorithm less robust. See Details.}

\item{coerce_numeric}{logical. If \code{TRUE}, values in \code{train_num} and  \code{test_num} will be coerced to numeric before the computation. The columns cannot be coerced will be removed (with warnings). See Examples. Default: \code{FALSE}.}
}
\value{
If \code{selectVar_batchWise = FALSE}, the function returns a list of length one containing the selected variables computed on the whole dataset.

If \code{selectVar_batchWise = TRUE}, a list containing the selected variables computed on different batches is returned. The length of the returned list equals the number of batch specified by \code{test_batchID} and/or \code{train_batchID}.
}
\description{
This function provides an advanced option to select metabolite variables from external dataset(s). The selected variables (as a list) can be further passed to argument \code{selectVar_external} in function \code{\link{run_TIGER}} for a customised data correction.
}
\details{
See \code{\link{run_TIGER}}.
}
\examples{

data(FF4_qc) # load demo dataset

# QC as training samples; QC1, QC2 and QC3 as test samples:
train_samples <- FF4_qc[FF4_qc$sampleType == "QC",]
test_samples  <- FF4_qc[FF4_qc$sampleType != "QC",]

# Only numeric data of metabolite variables are allowed:
train_num = train_samples[-c(1:5)]
test_num  = test_samples[-c(1:5)]

# If the selection is performed on the whole dataset:
# based on training samples only:
selected_var_1 <- select_variable(train_num = train_num,
                                  test_num  = NULL,
                                  selectVar_batchWise = FALSE)

# also consider test samples:
selected_var_2 <- select_variable(train_num = train_num,
                                  test_num  = test_num,
                                  selectVar_batchWise = FALSE)

# If the selection is based on different batches:
# (In selectVar_batchWise, batch ID is required.)
selected_var_3 <- select_variable(train_num = train_num,
                                  test_num  = NULL,
                                  train_batchID = train_samples$plateID,
                                  test_batchID  = NULL,
                                  selectVar_batchWise = TRUE)

# If coerce_numeric = TRUE,
# columns cannot be coerced to numeric will be removed (with warnings):
# (In this example, columns of injection order and well position are excluded.
# Because we don't want to calculate the correlations between metabolites and
# injection order/well position.)
selected_var_4 <- select_variable(train_num = train_samples[-c(4,5)],
                                  train_batchID = train_samples$plateID,
                                  selectVar_batchWise = TRUE,
                                  coerce_numeric = TRUE)
identical(selected_var_3, selected_var_4)  # identical to selected_var_3

\dontrun{

# will throw errors if input data have non-numeric columns
# and coerce_numeric = FALSE:

selected_var_5 <- select_variable(train_num = train_samples[-c(4,5)],
                                  coerce_numeric = FALSE)
}
}
