% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/outOfSample.R
\name{buildEvalSets}
\alias{buildEvalSets}
\title{Build set carve-up for out-of sample evaluation.}
\usage{
buildEvalSets(nRows, ..., dframe = NULL, y = NULL, splitFunction = NULL,
  nSplits = 3)
}
\arguments{
\item{nRows}{scalar, >=1 number of rows to sample from.}

\item{...}{no additional arguments, declared to forced named binding of later arguments.}

\item{dframe}{(optional) original data.frame, passed to user splitFunction.}

\item{y}{(optional) numeric vector, outcome variable (possibly to stratify on), passed to user splitFunction.}

\item{splitFunction}{(optional) function taking arguments nSplits,nRows,dframe, and y; returning a user desired split.}

\item{nSplits}{integer, target number of splits.}
}
\value{
list of lists where the app portion of the sub-lists is a disjoint carve-up of seq_len(nRows) and each list as a train portion disjoint from app.
}
\description{
Return a carve-up of seq_len(nRows).  Very useful for any sort of
nested model situation (such as data prep, stacking, or super-learning).
}
\details{
Also sets attribute "splitmethod" on return value that describes how the split was performed.
attr(returnValue,'splitmethod') is one of: 'notsplit' (data was not split; corner cases
like single row data sets), 'oneway' (leave one out holdout), 'kwaycross' (a simple
partition), 'userfunction' (user supplied function was actually used), or a user specified attribute.
Any user
desired properties (such as stratification on y, or preservation of groups designated by 
original data row numbers) may not apply unless you see that 'userfunction' has been
used.

The intent is the user splitFunction only needs to handle "easy cases" 
and maintain user invariants. If the user splitFunction returns NULL,
throws, or returns an unacceptable carve-up then vtreat::buildEvalSets
returns its own eval set plan.  The signature of splitFunction should
be splitFunction(nRows,nSplits,dframe,y) where nSplits is the number of 
pieces we want in the carve-up, nRows is the number of rows to split,
dframe is the original dataframe (useful for any group control variables),
and y is a numeric vector representing outcome (useful for outcome stratification).

Note that buildEvalSets may not always return a partition (such
as one row dataframes), or if the user split function chooses to make rows eligable for
applicaton a different number of times.
}
\examples{

# use
buildEvalSets(200)

# longer example
# helper fns
# fit models using experiment plan to estimate out of sample behavior
fitModelAndApply <- function(trainData,applicaitonData) {
   model <- lm(y~x,data=trainData)
   predict(model,newdata=applicaitonData)
}
simulateOutOfSampleTrainEval <- function(d,fitApplyFn) {
   eSets <- buildEvalSets(nrow(d))
   evals <- lapply(eSets, 
      function(ei) { fitApplyFn(d[ei$train,],d[ei$app,]) })
   pred <- numeric(nrow(d))
   for(eii in seq_len(length(eSets))) {
     pred[eSets[[eii]]$app] <- evals[[eii]]
   }
   pred
}

# run the experiment
set.seed(2352356)
# example data
d <- data.frame(x=rnorm(5),y=rnorm(5),
        outOfSampleEst=NA,inSampleEst=NA)
        
# fit model on all data
d$inSampleEst <- fitModelAndApply(d,d)
# compute in-sample R^2 (above zero, falsely shows a 
#   relation until we adjust for degrees of freedom)
1-sum((d$y-d$inSampleEst)^2)/sum((d$y-mean(d$y))^2)

d$outOfSampleEst <- simulateOutOfSampleTrainEval(d,fitModelAndApply)
# compute out-sample R^2 (not positive, 
#  evidence of no relation)
1-sum((d$y-d$outOfSampleEst)^2)/sum((d$y-mean(d$y))^2)

}
\seealso{
\code{\link{kWayCrossValidation}}, \code{\link{kWayStratifiedY}}, and \code{\link{makekWayCrossValidationGroupedByColumn}}
}
