% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/estimate.R
\name{estimate}
\alias{estimate}
\title{GGM: Estimation}
\usage{
estimate(
  Y,
  formula = NULL,
  type = "continuous",
  mixed_type = NULL,
  analytic = FALSE,
  prior_sd = 0.25,
  iter = 5000,
  impute = FALSE,
  progress = TRUE,
  seed = 1,
  ...
)
}
\arguments{
\item{Y}{Matrix (or data frame) of dimensions \emph{n} (observations) by  \emph{p} (variables).}

\item{formula}{An object of class \code{\link[stats]{formula}}. This allows for including
control variables in the model (i.e., \code{~ gender}). See the note for further details.}

\item{type}{Character string. Which type of data for \code{Y} ? The options include \code{continuous},
\code{binary}, \code{ordinal}, or \code{mixed}. Note that mixed can be used for data with only
ordinal variables. See the note for further details.}

\item{mixed_type}{Numeric vector. An indicator of length \emph{p} for which variables should be treated as ranks.
(1 for rank and 0 to assume normality). The default is currently to treat all integer variables as ranks
when \code{type = "mixed"} and \code{NULL} otherwise. See note for further details.}

\item{analytic}{Logical. Should the analytic solution be computed (default is \code{FALSE})?}

\item{prior_sd}{Scale of the prior distribution, approximately the standard deviation of a beta distribution
(defaults to 0.50).}

\item{iter}{Number of iterations (posterior samples; defaults to 5000).}

\item{impute}{Logical. Should the missing values (\code{NA})
be imputed during model fitting (defaults to \code{TRUE}) ?}

\item{progress}{Logical. Should a progress bar be included (defaults to \code{TRUE}) ?}

\item{seed}{An integer for the random seed.}

\item{...}{Currently ignored.}
}
\value{
The returned object of class \code{estimate} contains a lot of information that
        is used for printing and plotting the results. For users of \strong{BGGM}, the following
        are the useful objects:

\itemize{

\item \code{pcor_mat} Partial correltion matrix (posterior mean).

\item \code{post_samp} An object containing the posterior samples.

}
}
\description{
Estimate the conditional (in)dependence with either an analytic solution or efficiently
sampling from the posterior distribution. These methods were introduced in \insertCite{Williams2019;textual}{BGGM}.
The graph is selected with \code{\link{select.estimate}} and then plotted with \code{\link{plot.select}}.
}
\details{
The default is to draw samples from the posterior distribution (\code{analytic = FALSE}). The samples are
required for computing edge differences (see \code{\link{ggm_compare_estimate}}), Bayesian R2 introduced in
\insertCite{gelman_r2_2019;textual}{BGGM} (see \code{\link{predictability}}), etc. If the goal is
to *only* determine the non-zero effects, this can be accomplished by setting \code{analytic = TRUE}.
This is particularly useful when a fast solution is needed (see the examples in \code{\link{ggm_compare_ppc}})

\strong{Controlling for Variables}:

When controlling for variables, it is assumed that \code{Y} includes \emph{only}
the nodes in the GGM and the control variables. Internally, \code{only} the predictors
that are included in \code{formula} are removed from \code{Y}. This is not behavior of, say,
\code{\link{lm}}, but was adopted to ensure  users do not have to write out each variable that
should be included in the GGM. An example is provided below.

\strong{Mixed Type}:

 The term "mixed" is somewhat of a misnomer, because the method can be used for data including \emph{only}
 continuous or \emph{only} discrete variables. This is based on the ranked likelihood which requires sampling
 the ranks for each variable (i.e., the data is not merely transformed to ranks). This is computationally
 expensive when there are many levels. For example, with continuous data, there are as many ranks
 as data points!

 The option \code{mixed_type} allows the user to determine  which variable should be treated as ranks
 and the "emprical" distribution is used otherwise \insertCite{hoff2007extending}{BGGM}. This is
 accomplished by specifying an indicator vector of length \emph{p}. A one indicates to use the ranks,
 whereas a zero indicates to "ignore" that variable. By default all integer variables are treated as ranks.

\strong{Dealing with Errors}:

An error is most likely to arise when \code{type = "ordinal"}. The are two common errors (although still rare):

\itemize{

\item The first is due to sampling the thresholds, especially when the data is heavily skewed.
      This can result in an ill-defined matrix. If this occurs, we recommend to first try
      decreasing \code{prior_sd} (i.e., a more informative prior). If that does not work, then
      change the data type to \code{type = mixed} which then estimates a copula GGM
      (this method can be used for data containing \strong{only} ordinal variable). This should
      work without a problem.

\item  The second is due to how the ordinal data are categorized. For example, if the error states
       that the index is out of bounds, this indicates that the first category is a zero. This is not allowed, as
       the first category must be one. This is addressed by adding one (e.g., \code{Y + 1}) to the data matrix.

}

\strong{Imputing Missing Values}:

Missing values are imputed with the approach described in \insertCite{hoff2009first;textual}{BGGM}.
The basic idea is to impute the missing values with the respective posterior pedictive distribution,
given the observed data, as the model is being estimated. Note that the default is \code{TRUE},
but this ignored when there are no missing values. If set to \code{FALSE}, and there are missing
values, list-wise deletion is performed with \code{na.omit}.
}
\note{
\strong{Posterior Uncertainty}:

A key feature of \bold{BGGM} is that there is a posterior distribution for each partial correlation.
This readily allows for visiualizing uncertainty in the estimates. This feature works
with all data types and is accomplished by plotting the summary of the \code{estimate} object
(i.e., \code{plot(summary(fit))}). Several examples are provided below.



\strong{Interpretation of Conditional (In)dependence Models for Latent Data}:

See \code{\link{BGGM-package}} for details about interpreting GGMs based on latent data
(i.e, all data types besides \code{"continuous"})
}
\examples{
\donttest{
# note: iter = 250 for demonstrative purposes

#########################################
### example 1: continuous and ordinal ###
#########################################
# data
Y <- ptsd

# continuous

# fit model
fit <- estimate(Y, type = "continuous",
                iter = 250)

# summarize the partial correlations
summ <- summary(fit)

# plot the summary
plt_summ <- plot(summary(fit))

# select the graph
E <- select(fit)

# plot the selected graph
plt_E <- plot(select(fit))


# ordinal

# fit model (note + 1, due to zeros)
fit <- estimate(Y + 1, type = "ordinal",
                iter = 250)

# summarize the partial correlations
summ <- summary(fit)

# plot the summary
plt <- plot(summary(fit))

# select the graph
E <- select(fit)

# plot the selected graph
plt_E <- plot(select(fit))

##################################
## example 2: analytic solution ##
##################################
# (only continuous)

# data
Y <- ptsd

# fit model
fit <- estimate(Y, analytic = TRUE)

# summarize the partial correlations
summ <- summary(fit)

# plot summary
plt_summ <- plot(summary(fit))

# select graph
E <- select(fit)

# plot the selected graph
plt_E <- plot(select(fit))

}

}
\references{
\insertAllCited{}
}
