% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gp.R
\name{gp}
\alias{gp}
\title{Gaussian process emulator construction}
\usage{
gp(
  X,
  Y,
  struc = NULL,
  name = "sexp",
  lengthscale = rep(0.1, ncol(X)),
  bounds = NULL,
  prior = "ref",
  nugget_est = FALSE,
  nugget = ifelse(nugget_est, 0.01, 1e-08),
  scale_est = TRUE,
  scale = 1,
  training = TRUE,
  verb = TRUE,
  internal_input_idx = NULL,
  linked_idx = NULL
)
}
\arguments{
\item{X}{a matrix where each row is an input data point and each column is an input dimension.}

\item{Y}{a matrix with only one column and each row being an output data point.}

\item{struc}{an object produced by \code{\link[=kernel]{kernel()}} that gives a user-defined GP specifications. When \code{struc = NULL},
the GP specifications are automatically generated using information provided in \code{name}, \code{lengthscale},
\code{nugget_est}, \code{nugget}, \code{scale_est}, \code{scale},and \code{internal_input_idx}. Defaults to \code{NULL}.}

\item{name}{kernel function to be used. Either \code{"sexp"} for squared exponential kernel or
\code{"matern2.5"} for Matérn-2.5 kernel. Defaults to \code{"sexp"}. This argument is only used when \code{struc = NULL}.}

\item{lengthscale}{initial values of lengthscales in the kernel function. It can be a single numeric value or a vector:
\itemize{
\item if it is a single numeric value, it is assumed that kernel functions across input dimensions share the same lengthscale;
\item if it is a vector (which must have a length of \code{ncol(X)}), it is assumed that kernel functions across input dimensions have different lengthscales.
}

Defaults to a vector of \code{0.1}. This argument is only used when \code{struc = NULL}.}

\item{bounds}{the lower and upper bounds of lengthscales in the kernel function. It is a vector of length two where the first element is
the lower bound and the second element is the upper bound. The bounds will be applied to all lengthscales in the kernel function. Defaults
to \code{NULL} where no bounds are specified for the lengthscales. This argument is only used when \code{struc = NULL}.}

\item{prior}{prior to be used for Maximum a Posterior for lengthscales and nugget of the GP: gamma prior (\code{"ga"}), inverse gamma prior (\code{"inv_ga"}),
or jointly robust prior (\code{"ref"}). Defaults to \code{"ref"}. This argument is only used when \code{struc = NULL}. See the reference below for the jointly
robust prior.}

\item{nugget_est}{a bool indicating if the nugget term is to be estimated:
\enumerate{
\item \code{FALSE}: the nugget term is fixed to \code{nugget}.
\item \code{TRUE}: the nugget term will be estimated.
}

Defaults to \code{FALSE}. This argument is only used when \code{struc = NULL}.}

\item{nugget}{the initial nugget value. If \code{nugget_est = FALSE}, the assigned value is fixed during the training.
Set \code{nugget} to a small value (e.g., \code{1e-8}) and the corresponding bool in \code{nugget_est} to \code{FASLE} for deterministic emulations where the emulator
interpolates the training data points. Set \code{nugget} to a reasonable larger value and the corresponding bool in \code{nugget_est} to \code{TRUE} for stochastic
emulations where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to \code{1e-8} if \code{nugget_est = FALSE} and
\code{0.01} if \code{nugget_est = TRUE}. This argument is only used when \code{struc = NULL}.}

\item{scale_est}{a bool indicating if the variance is to be estimated:
\enumerate{
\item \code{FALSE}: the variance is fixed to \code{scale}.
\item \code{TRUE}: the variance term will be estimated.
}

Defaults to \code{TRUE}. This argument is only used when \code{struc = NULL}.}

\item{scale}{the initial variance value. If \code{scale_est = FALSE}, the assigned value is fixed during the training.
Defaults to \code{1}. This argument is only used when \code{struc = NULL}.}

\item{training}{a bool indicating if the initialized GP emulator will be trained.
When set to \code{FALSE}, \code{\link[=gp]{gp()}} returns an untrained GP emulator, to which one can apply \code{\link[=summary]{summary()}} to inspect its specifications
(especially when a customized \code{struc} is provided) or apply \code{\link[=predict]{predict()}} to check its emulation performance before the training. Defaults to \code{TRUE}.}

\item{verb}{a bool indicating if the trace information on GP emulator construction and training will be printed during the function execution.
Defaults to \code{TRUE}.}

\item{internal_input_idx}{the column indices of \code{X} that are generated by the linked emulators in the preceding layers.
Set \code{internal_input_idx = NULL} if the GP emulator is in the first layer of a system or all columns in \code{X} are
generated by the linked emulators in the preceding layers. Defaults to \code{NULL}. This argument is only used when \code{struc = NULL}.}

\item{linked_idx}{either a vector or a list of vectors:
\itemize{
\item If \code{linked_idx} is a vector, it gives indices of columns in the pooled output matrix (formed by column-combined outputs of all
emulators in the feeding layer) that feed into the GP emulator. If the GP emulator is in the first layer of a linked emulator system,
the vector gives the column indices of the global input (formed by column-combining all input matrices of emulators in the first layer)
that the GP emulator will use. The length of the vector shall equal to the length of \code{internal_input_idx} when \code{internal_input_idx} is not \code{NULL}.
\item When the GP emulator is not in the first layer of a linked emulator system, \code{linked_idx} can be a list that gives the information on connections
between the GP emulator and emulators in all preceding layers. The length of the list should equal to the number of layers before
the GP emulator. Each element of the list is a vector that gives indices of columns in the pooled output matrix (formed by column-combined outputs
of all emulators) in the corresponding layer that feed into the GP emulator. If the GP emulator has no connections to any emulator in a certain layer,
set \code{NULL} in the corresponding position of the list. The order of input dimensions in \code{X[,internal_input_idx]} should be consistent with \code{linked_idx}.
For example, a GP emulator in the second layer that is fed by the output dimension 1 and 3 of emulators in layer 1 should have \code{linked_idx = list( c(1,3) )}.
In addition, the first and second columns of \code{X[,internal_input_idx]} should correspond to the output dimensions 1 and 3 from layer 1.
}

Set \code{linked_idx = NULL} if the GP emulator will not be used for linked emulations. However, if this is no longer the case, one can use \code{\link[=set_linked_idx]{set_linked_idx()}}
to add linking information to the GP emulator. Defaults to \code{NULL}.}
}
\value{
An S3 class named \code{gp} that contains four slots:
\itemize{
\item \code{data}: a list that contains two elements: \code{X} and \code{Y} which are the training input and output data respectively.
\item \code{constructor_obj}: a 'python' object that stores the information of the constructed GP emulator.
\item \code{container_obj}: a 'python' object that stores the information for the linked emulation.
\item \code{emulator_obj}: a 'python' object that stores the information for the predictions from the GP emulator.
}

The returned \code{gp} object can be used by
\itemize{
\item \code{\link[=predict]{predict()}} for GP predictions.
\item \code{\link[=validate]{validate()}} for LOO and OOS validations.
\item \code{\link[=plot]{plot()}} for validation plots.
\item \code{\link[=lgp]{lgp()}} for linked (D)GP emulator constructions.
\item \code{\link[=design]{design()}} for sequential designs.
}
}
\description{
This function builds and trains a GP emulator.
}
\details{
See further examples and tutorials at \url{https://mingdeyu.github.io/dgpsi-R/}.
}
\note{
Any R vector detected in \code{X} and \code{Y} will be treated as a column vector and automatically converted into a single-column
R matrix. Thus, if \code{X} is a single data point with multiple dimensions, it must be given as a matrix.
}
\examples{
\dontrun{
# load the package and the Python env
library(dgpsi)
init_py()

# construct a step function
f <- function(x) {
   if (x < 0.5) return(-1)
   if (x >= 0.5) return(1)
  }

# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)

# training
m <- gp(X, Y)

# summarizing
summary(m)

# LOO cross validation
m <- validate(m)
plot(m)

# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)

# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)

# write and read the constructed emulator
write(m, 'step_gp')
m <- read('step_gp')
}

}
\references{
Gu, M. (2019). Jointly robust prior for Gaussian stochastic process in emulation, calibration and variable selection. \emph{Bayesian Analysis}, \strong{14(3)}, 857-885.
}
