% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/irs.R
\name{irs}
\alias{irs}
\title{Select an independent random sample (IRS)}
\usage{
irs(
  sframe,
  n_base,
  stratum_var = NULL,
  seltype = NULL,
  caty_var = NULL,
  caty_n = NULL,
  aux_var = NULL,
  legacy_var = NULL,
  legacy_sites = NULL,
  legacy_stratum_var = NULL,
  legacy_caty_var = NULL,
  legacy_aux_var = NULL,
  mindis = NULL,
  maxtry = 10,
  n_over = NULL,
  n_near = NULL,
  wgt_units = NULL,
  pt_density = NULL,
  DesignID = "Site",
  SiteBegin = 1,
  sep = "-",
  projcrs_check = TRUE
)
}
\arguments{
\item{sframe}{The sampling frame as an \code{sf} object. The coordinate
system for \code{sframe} must projected (not geographic). If m or z values
are in \code{sframe}'s geometry, they are silently dropped (i.e., only x-coordinates
and y-coordinates are preserved).}

\item{n_base}{The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by \code{stratum_var}. Legacy sites are considered part
of the base sample, so the value for \code{n_base} should be equal to the number
of legacy sites plus the number of desired non-legacy sites.}

\item{stratum_var}{A character string containing the name of the column from
\code{sframe} that identifies stratum membership for each element in \code{sframe}.
If stratum equals \code{NULL}, the sampling design is unstratified and all elements in \code{sframe}
are eligible to be selected in the sample. The default is \code{NULL}.}

\item{seltype}{A character string or vector indicating the inclusion probability type,
which must be one of following: \code{"equal"} for equal inclusion probabilities;
\code{"unequal"} for unequal inclusion probabilities according to a categorical
variable specified by \code{caty_var}; and \code{"proportional"} for inclusion
probabilities proportional to a positive auxiliary variable specified by
\code{aux_var}. If the sampling design is unstratified, \code{seltype} is a single
character vector. If the sampling design is stratified, \code{seltype} is a named vector
whose names represent each stratum and whose values represent each stratum's
inclusion probability type. \code{seltype}'s default value tries to match the
intended inclusion probability type: If \code{caty_var} and \code{aux_var} are
not specified, \code{seltype} is \code{"equal"}; if \code{caty_var} is specified,
\code{seltype} is \code{"unequal"}; and if \code{aux_var} is specified, \code{seltype}
is \code{"proportional"}.}

\item{caty_var}{A character string containing the name of the column from
\code{sframe} that represents the unequal probability variable.}

\item{caty_n}{A character vector indicating the expected sample size for each
level of \code{caty_var}, the unequal probability variable. If the sampling design
is unstratified, \code{caty_n} is a named vector whose names represent each
level of \code{caty_var} and whose values represent each level's expected
sample size. The sum of \code{caty_n} must equal \code{n_base}. If the sampling design
is stratified and the expected sample sizes are the same among strata, \code{caty_n} is
a named vector whose names represent represent each
level of \code{caty_var} and whose values represent each level's expected
sample size -- these expected sample sizes are applied to all strata. The sum of
\code{caty_n} must equal each stratum's value in \code{n_base}.
If the sampling design is stratified and the expected sample sizes differ among strata,
\code{caty_n} is a list where each element  is named as a stratum in \code{n_base}.
Each stratum's list element is a named vector whose
names represent each level of \code{caty_var} and whose values represent each
level's expected sample size (within the stratum). The sum of the values in each stratum's
list element must equal that stratum's value in \code{n_base}.}

\item{aux_var}{A character string containing the name of the column from
\code{sframe} that represents the proportional (to size) inclusion probability
variable (auxiliary variable). This auxiliary variable must be positive, and the resulting
inclusion probabilities are proportional to the values of the auxiliary variable.
Larger values of the auxiliary variable result in higher inclusion probabilities.}

\item{legacy_var}{This argument can be used instead of \code{legacy_sites}
when \code{sframe} is a \code{POINT} or \code{MULTIPOINT}
geometry (i.e. a finite sampling frame),
When \code{legacy_var} is used, it is a character string containing the name of the column
from \code{sframe} that represents whether each site is a legacy site. For
legacy sites, the values of the \code{legacy_var} must contain character strings that
act as a legacy site identifier. For non-legacy sites, the values of the
\code{legacy_var} column must be \code{NA}. Using this approach,
\code{legacy_stratum_var}, \code{legacy_caty_var}, and \code{legacy_aux_var}
are not required and should not be used (because \code{legacy_var} represents a column
in \code{sframe}). spsurvey assumes that the legacy sites were selected from
a previous sampling design that
incorporated randomness into site selection.}

\item{legacy_sites}{An sf object with a \code{POINT} or \code{MULTIPOINT}
geometry representing the legacy sites. spsurvey assumes that
the legacy sites were selected from a previous sampling design that
incorporated randomness into site selection. If m or z values
are in \code{legacy_sites}' geometry, they are silently dropped (i.e., only x-coordinates
and y-coordinates are preserved).}

\item{legacy_stratum_var}{A character string containing the name of the column from
\code{legacy_sites} that identifies stratum membership for each element of \code{legacy_sites}.
This argument is required when the sampling design is stratified and its levels
must be contained in the levels of the \code{stratum_var} variable. The default value of \code{legacy_stratum_var}
is \code{stratum_var}, so \code{legacy_stratum_var} need only be specified explicitly when
the name of the stratification variable in \code{legacy_sites} differs from \code{stratum_var}.}

\item{legacy_caty_var}{A character string containing the name of the column from
\code{legacy_sites} that identifies the unequal probability variable for each element of \code{legacy_sites}.
This argument is required when the sampling design uses unequal selection probabilities and its categories
must be contained in the levels of the \code{caty_var} variable. The default value of \code{legacy_caty_var}
is \code{caty_var}, so \code{legacy_caty_var} need only be specified explicitly when
the name of the unequal probability variable in \code{legacy_sites} differs from \code{caty_var}.}

\item{legacy_aux_var}{A character string containing the name of the column from
\code{legacy_sites} that identifies the proportional probability variable for each element of \code{legacy_sites}.
This argument is required when the sampling design uses proportional selection probabilities and the values of the
\code{legacy_aux_var} variable must be positive. The
default value of \code{legacy_aux_var} is \code{aux_var}, so \code{legacy_aux_var} need only be specified explicitly
when the name of the proportional probability variable in \code{legacy_sites} differs from \code{aux_var}.}

\item{mindis}{A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and \code{mindis} is an numeric value, the minimum
distance is applied to all strata. If the sampling design is stratified and different minimum distances
are desired among strata, then \code{mindis}
is a list whose names match the names of \code{n_base} and whose and values
are the minimum distance for the corresponding stratum.  If a minimum distance is not desired
for a particular stratum, then the corresponding value in \code{mindis} should be \code{0} or
 \code{NULL} (which is equivalent to \code{0}).
The units of \code{mindis} must represent the units in \code{sframe}. A warning is returned if the
minimum distance could not be reached after \code{maxtry} attempts. If legacy sites are used, the minimum distance
requirement (and subsequent warning if \code{maxtry} attempts are reached) is enforced for all base sites
that are not legacy sites (i.e., the minimum distance is enforced for these sites
by comparing distances against all base sites (legacy and non-legacy)).}

\item{maxtry}{The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are \code{maxtry} iterations.
The default number of maximum iterations is \code{10}.}

\item{n_over}{The number of reverse hierarchically ordered (rho) replacement sites.
 If the sampling design is unstratified, then
 \code{n_over} is an integer specifying the number of rho replacement sites desired.
If the sampling design is stratified,
 then \code{n_over} is a vector (or list) whose names match the names of \code{n_base} and
 whose values indicate the number of rho replacement sites for each stratum.
 If replacement sites are not desired for a particular stratum, then the corresponding
 value in \code{n_over} should be \code{0} or \code{NULL} (which is equivalent to \code{0}).
 If the sampling design is stratified but the number of \code{n_over} sites is the same in each
 stratum, \code{n_over} can be a vector which is used for each stratum. Note that if the
 sampling design has unequal selection probabilities (\code{seltype = "unequal"}), then \code{n_over} sites
 are given the same proportion of \code{caty_n} values as \code{n_base}.}

\item{n_near}{The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, \code{n_near} is integer from \code{1}
to \code{10} specifying the number of
nn replacement sites to be selected for each base site. If the sampling design
is stratified but the same number of nn replacement sites is desired
for each stratum,  \code{n_near} is integer from \code{1}
to \code{10} specifying the number of
nn replacement sites to be selected for each base site. If the sampling design is
unstratified and a different number of nn replacement sites is
desired for each stratum, \code{n_near} is a vector (or list) whose names represent strata and whose
values is integer from \code{1}
to \code{10} specifying the number of
nn replacement sites to be selected for each base site in the stratum. If replacement sites
are not desired for a particular stratum, then the corresponding value in \code{n_over}
should be \code{0} or \code{NULL} (which is equivalent to \code{0}). For
infinite sampling frames, the distance between a site and its nn
depends on \code{pt_density}. The larger \code{pt_density}, the closer the nn neighbors.}

\item{wgt_units}{The units used to compute the design weights. These
units must be standard units as defined by the \code{set_units()} function in
the units package. The default units match the units of the sf object.}

\item{pt_density}{A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. \code{pt_density} represents the density
of finite points per unit to use in the approximation. More specifically,
for each stratum, the number of points used in the approximation equals
\code{pt_density * (n_base + n_over)}. A larger value of \code{pt_density}
means a closer approximation to the infinite sampling frame but less
computational efficiency. The default value of \code{pt_density} is \code{10}. Note that
when used with \code{caty_n}, the unequal inclusion probabilities generated from
this approach are also approximations.}

\item{DesignID}{A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with \code{SiteBegin} and
included as a variable in the
sf object in the function's output. Default is "Site".}

\item{SiteBegin}{A character string indicating the first number to use to match
with \code{DesignID} while creating each site's identifier selected in the sample.
Successive sites are given successive integers. The default starting number
is \code{1} and the number of digits is equal to number of digits in
\code{nbase + nover}.
For example, if \code{nbase} is 50 and \code{nover} is 0, then the default
site identifiers are \code{Site-01} to \code{Site-50}}

\item{sep}{A character string that acts as a separator between
\code{DesignID} and \code{SiteBegin}. The default is \code{"-"}.}

\item{projcrs_check}{A check for whether the coordinates are projected. If \code{TRUE},
an error is returned if coordinates are not projected  (i.e., they are geographic or NA). If \code{FALSE}, the
check is not performed, which means that the crs in \code{sframe} (and \code{legacy_sites} if provided) can be projected, geographic, or NA.}
}
\value{
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
  \itemize{
    \item \code{sites_legacy} An sf object containing legacy sites. This is
      \code{NULL} if legacy sites were not included in the sample.
    \item \code{sites_base} An sf object containing the base sites. This is \code{NULL}
    if \code{n_base} equals the number of legacy sites.
    \item \code{sites_over} An sf object containing the reverse hierarchically
      ordered replacement sites. This is \code{NULL} if no reverse hierarchically
      ordered replacement sites were included in the sample.
    \item \code{sites_near} An sf object containing the nearest neighbor
      replacement sites. This is \code{NULL} if no nearest neighbor replacement
      sites were included in the sample.
    \item \code{design} A list documenting the specifications of this sampling design.
      This can be checked to verify your sampling design ran as intended.
      \itemize{
        \item \code{Call} The original function call.
        \item \code{stratum} The unique strata. This equals \code{"None"} if
          the sampling design was unstratified.
        \item \code{n_base} The base sample size per stratum.
        \item \code{seltype} The selection type per stratum.
        \item \code{caty_n} The expected sample sizes for each level of the
          unequal probability grouping variable per stratum. This equals
          \code{NULL} when \code{seltype} is not \code{"unequal"}.
        \item \code{legacy} A logical variable indicating whether legacy sites
          were included in the sample.
        \item \code{mindis} The minimum distance requirement desired. This
          is \code{NULL} when no minimum distance requirement was applied.
        \item \code{n_over} The reverse hierarchically ordered replacement
          site sample sizes per stratum. If \code{seltype} is \code{unequal},
          this represents the expected sample sizes. This is \code{NULL}
          when no reverse hierarchically ordered replacement sites were selected.
        \item \code{n_near} The number of nearest neighbor replacement sites
          desired. This is \code{NULL} when no nearest neighbor replacement
          sites were selected.
      }
  }
  When non-\code{NULL}, the \code{sites_legacy}, \code{sites_base},
  \code{sites_over}, and \code{sites_near} objects contain the original columns
  in \code{sframe} and include a few additional columns. These additional columns
  are
  \itemize{
    \item \code{siteID} A site identifier (as named using the \code{DesignID}
      and \code{SiteBegin} arguments to \code{grts()}).
    \item \code{siteuse} Whether the site is a legacy site (\code{Legacy}), base
      site (\code{Base}), reverse hierarchically ordered replacement site
      (\code{Over}), or nearest neighbor replacement site (\code{Near}).
    \item \code{replsite} The replacement site ordering. \code{replsite} is
      \code{None} if the site is not a replacement site, \code{Next} if it is
      the next reverse hierarchically ordered replacement site to use, or
      \code{Near_}, where the word following \code{_} indicates the ordering of sites closest to
      the originally sampled site.
    \item \code{lon_WGS84} Longitude coordinates using the WGS84 coordinate
      system (EPSG:4326). Only given if coordinates are projected.
    \item \code{lat_WGS84} Latitude coordinates using the WGS84 coordinate
      system (EPSG:4326). Only given if coordinates are projected.
    \item \code{X} Longitude coordinates using the provided coordinate
      system. Only given if coordinates are not projected (i.e., they are geographic or NA).
    \item \code{Y} Latitude coordinates using the provided coordinate
      system. Only given if coordinates are not projected (i.e., they are geographic or NA).
    \item \code{stratum} A stratum indicator. \code{stratum} is \code{None}
      if the sampling design was unstratified. If the sampling design was \code{stratified},
      \code{stratum} indicates the stratum.
    \item \code{wgt} The design weight.
    \item \code{ip} The site's original inclusion probability (the reciprocal)
      of (\code{wgt}).
    \item \code{caty} An unequal probability grouping indicator. \code{caty}
      is \code{None} if the sampling design did not use unequal inclusion probabilities.
      If the sampling design did use unequal inclusion probabilities, \code{caty}
      indicates the unequal probability level.
    \item \code{aux} The auxiliary proportional probability variable. This
      column is only returned if \code{seltype} was \code{proportional} in the
      original sampling design.
  }
  If any columns in \code{sframe} contain these names, those columns
  from \code{sframe} will be automatically prefixed with \code{sframe_}
  in the \code{sites} object.
}
\description{
Select a sample that is not spatially balanced from a point (finite), linear / linestring (infinite),
or areal / polygon (infinite) sampling frame using the Independent Random Sampling (IRS) algorithm.
The IRS algorithm accommodates unstratified and
stratified sampling designs and allows for equal inclusion probabilities, unequal
inclusion probabilities according to a categorical variable, and inclusion
probabilities proportional to a positive auxiliary variable. Several additional
sampling options are included, such as including legacy (historical) sites,
requiring a minimum distance between sites, and selecting replacement sites.
}
\details{
\code{n_base} is the number of sites used to calculate
  the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, \code{n_base} is typically the
  number of sites in all panels that will be sampled in the same temporal period --
  \code{n_base} is not the total number of sites in all panels. The sum of \code{n_base} and
  \code{n_over} is equal to the total number of sites to be visited for all panels plus
  any replacement sites that may be required.
}
\examples{
\dontrun{
sample <- irs(NE_Lakes, n_base = 100)
strata_n <- c(low = 25, high = 30)
sample_strat <- irs(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
sample_over <- irs(NE_Lakes, n_base = 30, n_over = 5)
}
}
\seealso{
\describe{
    \item{\code{\link{grts}}}{ to select a sample that is spatially balanced}
 }
}
\author{
Tony Olsen \email{olsen.tony@epa.gov}
}
