% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/plotenvelope.R
\name{plotenvelope}
\alias{plotenvelope}
\title{Diagnostic Plots for a Fitted Object with Simulation Envelopes}
\usage{
plotenvelope(
  y,
  which = 1:2,
  sim.method = "refit",
  n.sim = 199,
  conf.level = 0.95,
  type = "st",
  transform = NULL,
  main = c("Residuals vs Fitted Values", "Normal Quantile Plot", "Scale-Location Plot"),
  xlab = c("Fitted values", "Theoretical Quantiles", "Fitted Values"),
  ylab = c("Residuals", "Residuals", expression(sqrt("|Residuals|"))),
  col = par("col"),
  line.col = if (add.smooth) c("slateblue4", "olivedrab", "slateblue4") else
    rep("olivedrab", 3),
  envelope.col = adjustcolor(line.col, 0.1),
  add.smooth = TRUE,
  plot.it = TRUE,
  fitMin = if (inherits(y, "glm") | inherits(y, "manyglm")) -6 else -Inf,
  ...
)
}
\arguments{
\item{y}{is \emph{any} object that responds to \code{residuals}, \code{predict} and 
(if \code{sim.method="refit"}) \code{simulate} and \code{update}.}

\item{which}{a vector specifying the diagnostic plots to construct: \enumerate{
\item{residual vs fits, with a smoother}
\item{normal quantile plot}
\item{scale-location plot, with smoother}
}
These are the first three options in \code{\link{plot.lm}} and a little is borrowed 
from that code. A global envelope is included with each plot around where we expect to see
the data (or the smoother, if requested, for plots 1 and 3) when model assumptions are satisfied.
If not fully captured by the global envelope, there is some evidence that the model assumptions are not satisfied.}

\item{sim.method}{How should residuals be simulated? The default for most objects is \code{"refit"}, which constructs new responses (via \code{\link{simulate}}),
refits the model (via \code{\link{update}}), then recomputes residuals, often known as a \emph{parametric bootstrap}.
This is computationally intensive but gives a robust answer. This is the only suitable option if
residuals are not expected to be normal when assumptions are satisfied (like when using \code{\link[stats]{glm}} with a non-Gaussian family). 
Alternatively, \code{"norm"} simulates from a normal distribution, matches means and variances (and covariances for multivariate responses) to the observed residuals.
The \code{"stand.norm"} option sets means to zero and variances to one, which is appropriate when residuals
should be standard normal when assumptions are satisfied (as for any object fitted using the \code{mvabund}
package, for example). These options are faster but more approximate than \code{"refit"}, in fact \code{"stand.norm"} is 
used as the default for \code{\link[mvabund]{manyglm}} objects, for computational reasons.}

\item{n.sim}{the number of simulated sets of residuals to be generated, to which
the observed residuals will be compared. The default is 199 datasets.}

\item{conf.level}{the confidence level to use in constructing the envelope.}

\item{type}{the type of global envelope to construct, see 
\code{\link[GET]{global_envelope_test}} for details. Default \code{"st"} uses 
studentized envelope tests to protect for unequal variance, which has performed well
in simulations in this context.}

\item{transform}{a character vector pointing to a function that should be applied to both
axes of the normal quantile plot. The most common use is to set \code{transform="pnorm"}
for a PP-plot.}

\item{main}{the plot title (if a plot is produced). A vector of three titles, one for each plot.
If only one value is given that will be used for all plots.}

\item{xlab}{\code{x} axis label (if a plot is produced). A vector of three labels, one for each plot.
If only one value is given that will be used for all plots.}

\item{ylab}{\code{y} axis label (if a plot is produced). A vector of three labels, one for each plot.
If only one value is given that will be used for all plots.}

\item{col}{color of points}

\item{line.col}{a vector of length three containing the colors of the lines on the three diagnostic plots.
Defaults to "slateblue4" for smoothers and to "olivedrab" otherwise. Because it's cool.}

\item{envelope.col}{color of the global envelope around the expected trend. All data points should always stay within this envelope
(and will for a proportion \code{conf.level} of datasets satisfying model assumptions). If a smoother has been used on
the residual vs fits or scale-location plot, the envelope is constructed around the smoother, that is, the smoother should always stay
within the envelope.}

\item{add.smooth}{logical defaults to \code{TRUE}, which adds a smoother to residual vs fits and
scale-location plots, and computes a global envelope around the \emph{smoother} rather than the data (\code{add.smooth=FALSE}). No smoother is added to the normal quantile plot.}

\item{plot.it}{logical. Should the result be plotted? If not, a list of analysis outputs is returned, see \emph{Value}.}

\item{fitMin}{the minimum fitted value to use in plots, any fitted value less than this will be truncated to
\code{fitMin}. This is useful for generalised linear models, where small fitted values correspond to
predictions that are numerically zero. The default is to set \code{fitMin} to \code{-6} for GLMs, otherwise no truncation (\code{-Inf}).}

\item{...}{further arguments sent through to \code{plot}}
}
\value{
Up to three diagnostic plots with simulation envelopes are returned, and additionally a list of 
three objects used in plotting, for plots 1-3 respectively. Each is a list with five components:\describe{
\item{\code{x}}{\emph{X}-values used for the envelope. In plots 1 and 3 this is the fitted values, or if \code{add.smooth=TRUE}, this is 500 equally spaced points covering
the range of fitted values. For plot 2, this is sorted normal quantiles corresponding to observed data.}
\item{\code{y}}{The \emph{Y}-values used for the envelope. In plots 1 and 3 this is the residuals, or with \code{add.smooth=TRUE}, this is the values of the smoother corresponding to \code{x}. For plot 2,
this is the sorted residuals.}
\item{\code{lo}}{The lower bound on the global envelope, for each value of \code{x}.}
\item{\code{hi}}{The upper bound on the global envelope, for each value of \code{x}.}
\item{\code{p.value}}{A \emph{P}-value for the test that observed smoother or data are consistent with what
would be expected if the fitted model were correct, computed in \code{\link[GET]{global_envelope_test}}.} 
}
}
\description{
Produces diagnostic plots of a fitted model \code{y}, and
adds global envelopes constructed by simulating new residuals, to see how
departures from expected trends compare to what might be expected if the 
fitted model were correct. Global envelopes are constructed using the
\code{GET} package (Myllymäki et al 2017) for simultaneous control of error rates over the
whole plot, by simulating new responses from the fitted model then recomputing residuals
(which can be computationally intensive), or alternatively, by simulating residuals directly from the (multivariate) normal distribution
(faster, but not always a smart move). Options for diagnostic plots to construct are a residual vs fits,
a normal quantile plot, or scale-location plot, along the lines of \code{\link{plot.lm}}.
}
\details{
A challenge when interpreting diagnostic plots is understanding the extent to which
deviations from the expected pattern could be due to random noise (sampling variation)
rather than actual assumption violations. This function is intended to assess this, 
by simulating multiple realizations of residuals (and fitted values) in situations where 
assumptions are satisfied, and plotting a global simulation envelope around these at level \code{conf.level}.

This function can take any fitted model, and construct any of three diagnostic plots, as determined by \code{which}:
\enumerate{
\item Residual vs fits plot (optionally, with a smoother)
\item Normal quantile plot
\item Scale-Location plot (optionally, with smoother)
}
and see if the trend is behaving as expected if the model were true. As long as 
the fitted model responds to \code{\link{residuals}} and \code{\link{predict}} 
(and when \code{sim.method="refit"}, \code{\link{simulate}} and \code{\link{update}}) then a simulation envelope
will be constructed for each plot.

Simulation envelopes are global, constructed using the \code{\link[GET]{GET-package}}, meaning that
(for example) a 95\% global envelope on a quantile plot should contain \emph{all} residuals for 95\% of datasets
that satisfy model assumptions. So if \emph{any} data points lie outside the
quantile plot's envelope we have evidence that assumptions of the fitted model are not satisfied. 
The \code{\link[GET]{GET-package}} was originally constructed to place envelopes around functions, motivated by
the problem of diagnostic testing of spatial processes (Myllymäki et al 2017), but it can equally
well be applied here, by treating the set of residuals (ordered according to the x-axis) as point-wise evaluations of a function.
For residual vs fits and scale-location plots, if \code{do.smooth=TRUE}, global envelopes are constructed for
the \emph{smoother}, not for the data, hence we are looking to see if the smoother
is wholly contained within the envelope. The smoother is constructed using \code{\link[mgcv]{gam}} from the \code{mgcv} package.

Note that the global envelopes only tell you if there is evidence of violation of model
assumptions -- they do not tell you whether the violations are large enough to worry about. For example, in linear models,
violations of normality are usually much less important than violations of linearity or equal variance. And in all cases,
modest violations tend to only have modest effects on the validity of inferences, so you need to think about how big
observed violations are rather than just thinking about whether or not anything is outside its simulation envelope.

The method used to simulate data for the global envelopes is controlled by \code{sim.method}.
The default method (\code{sim.method="refit"}) uses a \emph{parametric bootstrap} approach: simulate 
new responses from the fitted model, refit the model and recompute residuals and fitted values. 
This directly assesses whether trends in observed trends depart from what would be expected if the fitted model
were correct, without any further assumptions. For complex models or large datasets this would however be super-slow.
A fast, more approximate alternative (\code{sim.method="norm"}) is to simulate new (multivariate) normal residuals 
repeatedly and use these to assess whether trends in the observed data depart from what would be expected
for independent (multivariate) normal residuals. If residuals are expected to be standard
normal, a more refined check is to simulate from the standard normal using (\code{sim.method="stand.norm"}).
This might for example be useful when diagnosing a model fitted using the \code{mvabund} package (Wang et al. 2012),
since this calculates Dunn-Smyth residuals (Dunn & Smyth 1996), which are approximately standard normal when assumptions are satisfied.  
If \code{y} is a \code{glm} with non-Gaussian family then residuals will not be normal and \code{"refit"} is the
only appropriate option, hence other choices will be overridden with a warning reporting that this
has been done. 

Note that for Multivariate Linear Models (\code{mlm}), \code{\link{cresiduals}} and \code{\link{cpredict}} 
are used to construct residuals and fitted values (respectively) from the \emph{full conditional models}
(that is, models constructed by regressing each response against all other responses
together with predictors). This is done because full conditionals are diagnostic of joint 
distributions, so \emph{any} violation of multivariate normality is expressed as a violation of 
linear model assumptions on full conditionals. Results for all responses are overlaid on a single plot,
future versions of this function may have an overlay option to separately plot each response.

The simulated data and subsequent analysis are also used to obtain a \emph{P}-value 
for the test that model assumptions are correct, for each plot. This tests if sample residuals or their smoothers
are unusually far from the values expected of them if model assumptions were satisfied. For details see
\code{\link[GET]{global_envelope_test}}.
}
\examples{
# fit a Poisson regression to random data:
y = rpois(50,lambda=1)
x = 1:50
rpois_glm = glm(y~x,family=poisson())
plotenvelope(rpois_glm,n.sim=99)

# Fit a multivariate linear model to the iris dataset:
data(iris)
Y = with(iris, cbind(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width))
iris_mlm=lm(Y~Species,data=iris)
# check normality assumption:
plotenvelope(iris_mlm,n.sim=99,which=2)

# A few more plots, with envelopes around data not smoothers:
\donttest{plotenvelope(iris_mlm, which=1:3, add.smooth=FALSE)
# Note violation on the scale/location plot.}

}
\references{
Dunn, P. K., & Smyth, G. K. (1996), Randomized quantile residuals. J. Comp. Graphical Stat. 5, 236-244. doi:10.1080/10618600.1996.10474708

Myllymäki, M., Mrkvička, T., Grabarnik, P., Seijo, H. and Hahn, U. (2017), Global envelope tests for spatial processes. J. R. Stat. Soc. B, 79: 381-404. doi:10.1111/rssb.12172

Wang, Y. I., Naumann, U., Wright, S. T., & Warton, D. I. (2012), \code{mvabund} – an \code{R} package for model‐based analysis of multivariate abundance data. Methods in Ecology and Evolution, 3, 471-474. doi:10.1111/j.2041-210X.2012.00190.x

Warton DI (2022) Eco-Stats - Data Analysis in Ecology, from \emph{t}-tests to multivariate abundances. Springer, ISBN 978-3-030-88442-0
}
\seealso{
\code{\link{cpredict}}, \code{\link{cresiduals}}, \code{\link{qqenvelope}}
}
\author{
David Warton <david.warton@unsw.edu.au>
}
