% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/summarize.R
\name{summarize}
\alias{summarize}
\title{Compute summary statistics}
\usage{
summarize(
  formula,
  data,
  na.action = stats::na.pass,
  na.rm = FALSE,
  level = 0.95,
  columns = c("observed", "missing", "pc.missing", "mean", "sd", "min", "q1", "median",
    "q3", "max", "correlation"),
  FUN = NULL,
  which = NULL,
  skip.reference = TRUE,
  digits = NULL,
  ...
)
}
\arguments{
\item{formula}{[formula] on the left hand side the outcome(s) and on the right hand side the grouping variables.
E.g. Y1+Y2 ~ Gender + Gene will compute for each gender and gene the summary statistics for Y1 and for Y2.
Passed to the \code{stats::aggregate} function.}

\item{data}{[data.frame] dataset containing the observations.}

\item{na.action}{[function] a function which indicates what should happen when the data contain 'NA' values.
Passed to the \code{stats::aggregate} function.}

\item{na.rm}{[logical] Should the summary statistics be computed by omitting the missing values.}

\item{level}{[numeric,0-1] the confidence level of the confidence intervals.}

\item{columns}{[character vector] name of the summary statistics to kept in the output.
Can be any of, or a combination of:\itemize{
\item \code{"observed"}: number of observations with a measurement.
\item \code{"missing"}: number of missing observations.
When specifying a grouping variable, it will also attempt to count missing rows in the dataset.
\item \code{"pc.missing"}: percentage missing observations.
\item \code{"mean"}, \code{"mean.lower"} \code{"mean.upper"}: mean with its confidence interval.
\item \code{"median"}, \code{"median.lower"} \code{"median.upper"}: median with its confidence interval.
\item \code{"sd"}: standard deviation.
\item \code{"q1"}, \code{"q3"}, \code{"IQR"}: 1st and 3rd quartile, interquartile range.
\item \code{"min"}, \code{"max"}: minimum and maximum observation.
\item \code{"predict.lower"}, \code{"predict.upper"}: prediction interval for normally distributed outcome.
\item \code{"correlation"}: correlation matrix between the outcomes (when feasible, see detail section).
}}

\item{FUN}{[function] user-defined function for computing summary statistics.
It should take a vector as an argument and output a named single value or a named vector.}

\item{which}{deprecated, use the argument columns instead.}

\item{skip.reference}{[logical] should the summary statistics for the reference level of categorical variables be omitted?}

\item{digits}{[integer, >=0] the minimum number of significant digits to be used to display the results. Passed to \code{print.data.frame}}

\item{...}{additional arguments passed to argument \code{FUN}.}
}
\value{
A data frame containing summary statistics (in columns) for each outcome and value of the grouping variables (rows). It has an attribute \code{"correlation"} when it was possible to compute the correlation matrix for each outcome with respect to the grouping variable.
}
\description{
Compute summary statistics for multiple variables and/or multiple groups and save them in a data frame.
}
\details{
This function is essentially an interface to the \code{stats::aggregate} function. \cr
\bold{WARNING:} it has the same name as a function from the dplyr package. If you have loaded dplyr already, you should use \code{:::} to call summarize i.e. use \code{LMMstar:::summarize}.

Confidence intervals (CI) and prediction intervals (PI) for the mean are computed via \code{stats::t.test}.
Confidence intervals (CI) for the median are computed via \code{asht::medianTest}.

Correlation can be assessed when a grouping and ordering variable are given in the formula interface , e.g. Y ~ time|id.
}
\examples{
## simulate data in the wide format
set.seed(10)
d <- sampleRem(1e2, n.times = 3)
d$treat <-  sample(LETTERS[1:3], NROW(d), replace=TRUE, prob=c(0.3, 0.3, 0.4) )

## add a missing value
d2 <- d
d2[1,"Y2"] <- NA

## run summarize
summarize(Y1 ~ 1, data = d)
summarize(Y1 ~ 1, data = d, FUN = quantile, p = c(0.25,0.75))
summarize(Y1+Y2 ~ X1, data = d)
summarize(treat ~ 1, skip.reference = FALSE, data = d)

summarize(Y1 ~ X1, data = d2)
summarize(Y1+Y2 ~ X1, data = d2, na.rm = TRUE)

## long format
dL <- reshape(d, idvar = "id", direction = "long",
             v.names = "Y", varying = c("Y1","Y2","Y3"))
summarize(Y ~ time + X1, data = dL)

## compute correlations (single time variable)
e.S <- summarize(Y ~ time + X1 | id, data = dL, na.rm = TRUE)
e.S
attr(e.S, "correlation")

## compute correlations (composite time variable)
dL$time2 <- dL$time == 2
dL$time3 <- dL$time == 3
e.S <- summarize(Y ~ time2 + time3 + X1 | id, data = dL, na.rm = TRUE)
e.S
attr(e.S, "correlation")
}
\keyword{utilities}
