% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/bayesRegress.R
\name{bayesRegressAssign}
\alias{bayesRegressAssign}
\title{Regression for one or more samples, given some training data.}
\usage{
bayesRegressAssign(
  dfTrain,
  dfValid,
  targetCol,
  selectedFeatureNames = c(),
  shiftAmount = 0.1,
  retainMinValues = 2,
  doEcdf = FALSE,
  online = 0,
  simple = FALSE,
  useParallel = NULL,
  numBuckets = ceiling(log2(nrow(df))),
  sampleFromAllBuckets = TRUE,
  regressor = NULL
)
}
\arguments{
\item{dfTrain}{data.frame that holds the training data.}

\item{dfValid}{data.frame that holds the validation samples, for each of which
a probability is sought. The convention is, that if you attempt to assign a
probability to a numeric value, it ought to be found in the target column of
this data frame (otherwise, the target column is not required in it).}

\item{targetCol}{character the name of targeted feature, i.e., the feature to
assign a probability to.}

\item{selectedFeatureNames}{character defaults to empty vector which defaults
to using all available features. Use this to select subsets of features and to
order features.}

\item{shiftAmount}{numeric an offset value used to increase any one
probability (factor) in the full built equation.}

\item{retainMinValues}{integer to require a minimum amount of data points
when segmenting the data feature by feature.}

\item{doEcdf}{default FALSE a boolean to indicate whether to use the
empirical CDF to return a probability when inferencing a continuous
feature.}

\item{online}{default 0 integer to indicate how many rows should be used to
do inferencing. If zero, then only the initially given data.frame dfTrain is
used. If > 0, then each inferenced sample will be attached to it and the
resulting data.frame is truncated to this number. Use an integer large enough
(i.e., sum of training and validation rows) to keep all samples during
inferencing. A smaller amount as, e.g., in dfTrain, will keep the amount of data
restricted, discarding older rows. A larger amount than, e.g., in dfTrain is
also fine; dfTrain will grow to it and then discard rows.}

\item{simple}{default FALSE boolean to indicate whether or not to use simple
Bayesian inferencing instead of full. This is faster but the results are less
good. If true, uses \code{mmb::bayesRegressSimple()}. Otherwise, uses
\code{mmb::bayesRegress()}.}

\item{useParallel}{boolean DEFAULT NULL this is forwarded to the underlying
function \code{mmb::bayesRegress()} (only in simple=FALSE mode).}

\item{numBuckets}{integer the amount of buckets to for discretization.
Buckets are built in an equidistant manner, not as quantiles (i.e., one
bucket has likely a different amount of values than another).}

\item{sampleFromAllBuckets}{default TRUE boolean to indicate how to
obtain values for regression from the buckets. If true, than takes
values from those buckets with a non-zero probability, and according
to their probability. If false, selects all values from the bucket
with the highest probability.}

\item{regressor}{Function that is given the collected values for
regression and thus finally used to select a most likely value. Defaults
to the built-in estimator for the empirical PDF and returns its argmax.
However, any other function can be used, too, such as min, max, median,
average etc. You may also use this function to obtain the raw values
for further processing.#'}
}
\description{
This method uses full-dependency (\code{simple=F}) Bayesian
inferencing to to a regression for the target features for all of the
samples given in \code{dfValid}. Assigns a regression value using either
}
\examples{
\donttest{
df <- iris[, ]
set.seed(84735)
rn <- base::sample(rownames(df), 150)
dfTrain <- df[1:120, ]
dfValid <- df[121:150, ]
res <- mmb::bayesRegressAssign(
  dfTrain, dfValid[, !(colnames(dfValid) \%in\% "Sepal.Length")],
  "Sepal.Length", sampleFromAllBuckets = TRUE, doEcdf = TRUE)
cov(res, iris[121:150,]$Sepal.Length)^2
}
}
\seealso{
\code{mmb::bayesRegress()} (full) or @seealso \code{mmb::bayesRegressSimple()}
if \code{simple=T}. It mostly forwards the given arguments to these functions,
and you will find good documentation there.
}
\author{
Sebastian Hönel \href{mailto:sebastian.honel@lnu.se}{sebastian.honel@lnu.se}
}
\keyword{full-dependency}
\keyword{regression}
