% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stan_biglm.R, R/stan_biglm.fit.R
\name{stan_biglm}
\alias{stan_biglm}
\alias{stan_biglm.fit}
\title{Bayesian regularized linear but big models via Stan}
\usage{
stan_biglm(biglm, xbar, ybar, s_y, ...,
  prior = R2(stop("'location' must be specified")),
  prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling",
  "meanfield", "fullrank"), adapt_delta = NULL)

stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, has_intercept = TRUE, ...,
  prior = R2(stop("'location' must be specified")),
  prior_intercept = NULL, prior_PD = FALSE, algorithm = c("sampling",
  "meanfield", "fullrank"), adapt_delta = NULL)
}
\arguments{
\item{biglm}{The list output by \code{\link[biglm]{biglm}} in the \pkg{biglm}
package.}

\item{xbar}{A numeric vector of column means in the implicit design matrix 
excluding the intercept for the observations included in the model.}

\item{ybar}{A numeric scalar indicating the mean of the outcome for the
observations included in the model.}

\item{s_y}{A numeric scalar indicating the unbiased sample standard deviation
of the outcome for the observations included in the model.}

\item{...}{Further arguments passed to the function in the \pkg{rstan} 
package (\code{\link[rstan]{sampling}}, \code{\link[rstan]{vb}}, or 
\code{\link[rstan]{optimizing}}), corresponding to the estimation method 
named by \code{algorithm}. For example, if \code{algorithm} is
\code{"sampling"} it is possibly to specify \code{iter}, \code{chains},
\code{cores}, \code{refresh}, etc.}

\item{prior}{Must be a call to \code{\link{R2}} with its \code{location}
argument specified or \code{NULL}, which would indicate a standard uniform
prior for the \eqn{R^2}.}

\item{prior_intercept}{Either \code{NULL} (the default) or a call to
\code{\link{normal}}. If a \code{\link{normal}} prior is specified
without a \code{scale}, then the standard deviation is taken to be
the marginal standard deviation of the outcome divided by the square
root of the sample size, which is legitimate because the marginal
standard deviation of the outcome is a primitive parameter being
estimated.

\strong{Note:} If using a dense representation of the design matrix
---i.e., if the \code{sparse} argument is left at its default value of
\code{FALSE}--- then the prior distribution for the intercept is set so it
applies to the value \emph{when all predictors are centered}. If you prefer
to specify a prior on the intercept without the predictors being
auto-centered, then you have to omit the intercept from the
\code{\link[stats]{formula}} and include a column of ones as a predictor,
in which case some element of \code{prior} specifies the prior on it,
rather than \code{prior_intercept}. Regardless of how
\code{prior_intercept} is specified, the reported \emph{estimates} of the
intercept always correspond to a parameterization without centered
predictors (i.e., same as in \code{glm}).}

\item{prior_PD}{A logical scalar (defaulting to \code{FALSE}) indicating
whether to draw from the prior predictive distribution instead of
conditioning on the outcome.}

\item{algorithm}{A string (possibly abbreviated) indicating the 
estimation approach to use. Can be \code{"sampling"} for MCMC (the
default), \code{"optimizing"} for optimization, \code{"meanfield"} for
variational inference with independent normal distributions, or
\code{"fullrank"} for variational inference with a multivariate normal
distribution. See \code{\link{rstanarm-package}} for more details on the
estimation algorithms. NOTE: not all fitting functions support all four
algorithms.}

\item{adapt_delta}{Only relevant if \code{algorithm="sampling"}. See 
the \link{adapt_delta} help page for details.}

\item{b}{A numeric vector of OLS coefficients, excluding the intercept}

\item{R}{A square upper-triangular matrix from the QR decomposition of the 
design matrix, excluding the intercept}

\item{SSR}{A numeric scalar indicating the sum-of-squared residuals for OLS}

\item{N}{A integer scalar indicating the number of included observations}

\item{has_intercept}{A logical scalar indicating whether to add an intercept 
to the model when estimating it.}
}
\value{
The output of both \code{stan_biglm} and \code{stan_biglm.fit} is an
  object of \code{\link[rstan]{stanfit-class}} rather than
  \code{\link{stanreg-objects}}, which is more limited and less convenient
  but necessitated by the fact that \code{stan_biglm} does not bring the full
  design matrix into memory. Without the full design matrix,some of the
  elements of a \code{\link{stanreg-objects}} object cannot be calculated,
  such as residuals. Thus, the functions in the \pkg{rstanarm} package that
  input \code{\link{stanreg-objects}}, such as 
  \code{\link{posterior_predict}} cannot be used.
}
\description{
\if{html}{\figure{stanlogo.png}{options: width="25px" alt="http://mc-stan.org/about/logo/"}}
This is the same model as with \code{\link{stan_lm}} but it utilizes the
output from \code{\link[biglm]{biglm}} in the \pkg{biglm} package in order to
proceed when the data is too large to fit in memory.
}
\details{
The \code{stan_biglm} function is intended to be used in the same 
  circumstances as the \code{\link[biglm]{biglm}} function in the \pkg{biglm}
  package but with an informative prior on the \eqn{R^2} of the regression. 
  Like \code{\link[biglm]{biglm}}, the memory required to estimate the model 
  depends largely on the number of predictors rather than the number of 
  observations. However, \code{stan_biglm} and \code{stan_biglm.fit} have 
  additional required arguments that are not necessary in 
  \code{\link[biglm]{biglm}}, namely \code{xbar}, \code{ybar}, and \code{s_y}.
  If any observations have any missing values on any of the predictors or the 
  outcome, such observations do not contribute to these statistics.
}
\examples{
# create inputs
ols <- lm(mpg ~ wt + qsec + am, data = mtcars, # all row are complete so ...
          na.action = na.exclude)              # not necessary in this case
b <- coef(ols)[-1]
R <- qr.R(ols$qr)[-1,-1]
SSR <- crossprod(ols$residuals)[1]
not_NA <- !is.na(fitted(ols))
N <- sum(not_NA)
xbar <- colMeans(mtcars[not_NA,c("wt", "qsec", "am")])
y <- mtcars$mpg[not_NA]
ybar <- mean(y)
s_y <- sd(y)
post <- stan_biglm.fit(b, R, SSR, N, xbar, ybar, s_y, prior = R2(.75),
                       # the next line is only to make the example go fast
                       chains = 1, iter = 500, seed = 12345)
cbind(lm = b, stan_lm = rstan::get_posterior_mean(post)[13:15,]) # shrunk
}
