% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/prob_sup.R
\name{prob_sup}
\alias{prob_sup}
\title{Probabilities of superior performance and stability}
\usage{
prob_sup(
  data,
  trait,
  gen,
  env,
  reg = NULL,
  mod.output,
  int,
  increase = TRUE,
  save.df = FALSE,
  interactive = FALSE,
  verbose = FALSE
)
}
\arguments{
\item{data}{A data frame containing the phenotypic data}

\item{trait, gen, env}{A string. The name of the columns that corresponds to
the variable, genotype and environment information, respectively}

\item{reg}{A string or NULL. If the dataset has information about regions,
\code{reg} will be a string with the name of the column that corresponds to the
region information. Otherwise, \code{reg = NULL} (default).}

\item{mod.output}{An object from the \code{\link[=extr_outs]{extr_outs()}} function}

\item{int}{A number representing the selection intensity
(between 0 and 1)}

\item{increase}{Logical. Indicates the direction of the selection.
\code{TRUE} (default) for increasing the trait value, \code{FALSE} otherwise.}

\item{save.df}{Logical. Should the data frames be saved in the work directory?
\code{TRUE} for saving, \code{FALSE} (default) otherwise.}

\item{interactive}{Logical. Should ggplots be converted into interactive plots?
If \code{TRUE}, the function loads the \code{plotly} package and uses the \code{\link[plotly:ggplotly]{plotly::ggplotly()}}
command.}

\item{verbose}{A logical value. If \code{TRUE}, the function will indicate the
completed steps. Defaults to \code{FALSE}.}
}
\value{
The function returns two lists, one with the \code{marginal} probabilities, and
another with the \code{conditional} probabilities.

The \code{marginal} list has:
\itemize{
\item \code{df} : A list of data frames containing the calculated probabilities:
\itemize{
\item \code{perfo}: the probabilities of superior performance
\item \code{pair_perfo}: the pairwise probabilities of superior performance
\item \code{stabi}: the probabilities of superior stability. When \code{reg} is not
\code{NULL}, \code{stabi} is divided into \code{stabi_gl} for the stability across environments, and
\code{stabi_gm} for the stability across regions
\item \code{pair_stabi}: the pairwise probabilities of superior stability, which
is also divided into \code{pair_stabi_gl} and \code{pair_stabi_gm} when \code{reg} is not \code{NULL}
\item \code{joint_prob}: the joint probabilities of superior performance and stability
}
\item \code{plot} : A list of ggplots illustrating the outputs:
\itemize{
\item \code{g_hpd}: a caterpillar plot representing the marginal genotypic value of
each genotype, and their respective highest posterior density interval (95\% represented by the
thick line, and 97.5\% represented by the thin line)
\item \code{perfo}: a bar plot illustrating the probabilities of superior performance
\item \code{pair_perfo}: a heatmap representing the pairwise probability of superior
performance (the probability of genotypes at the \emph{x}-axis being superior
to those on the \emph{y}-axis)
\item \code{stabi}: a bar plot with the probabilities of superior stability. Like the data frames,
when \code{reg} is not \code{NULL}, two different plots are generated, one for the stability across
environments (\code{stabi_gl}), and another for the stability across regions (\code{stabi_gm})
\item \code{pair_stabi}: a heatmap with the pairwise probabilities of superior stability
(also divided into two different plots when \code{reg} is not \code{NULL}). This plot represents
the probability of genotypes at the \emph{x}-axis being superior
to those on \emph{y}-axis
\item \code{joint_prob}: a plot with the probabilities of superior performance,
probabilities of superior stability and the joint probabilities of superior
performance and stability.
}
}

The \code{conditional} list has:
\itemize{
\item \code{df} : A list with:
\itemize{
\item \code{perfo}: a data frame containing the probabilities of superior performance
within environments. It also has the probabilities of superior performance within regions
if \code{reg} is not \code{NULL}.
\item \code{pair_perfo}: a list with the pairwise probabilities of superior performance
within environments. If \code{reg} is not \code{NULL}, two lists are generated.
}
\item \code{plot} : A list with:
\itemize{
\item \code{perfo}: a heatmap with the probabilities of superior performance within
environments. If \code{reg} is not \code{NULL}, there will be two heatmaps: \code{perfo_env}
for the probabilities of superior performance within environments, and \code{perfo_reg}
for the same probabilities within regions.
\item \code{pair_perfo}: a list of heatmaps representing the pairwise probability of superior
performance within environments. If \code{reg} is not \code{NULL}, there will be two lists: \code{pair_perfo_env}
for the probabilities of superior performance within environments, and \code{pair_perfo_reg}
for the same probabilities within regions. The interpretation is the same as in the
\code{pair_perfo} in the \code{marginal} list: the probability of genotypes at the \emph{x}-axis being superior
to those on \emph{y}-axis.
}
}
}
\description{
This function estimates the probabilities of superior performance and stability
across environments (\code{marginal} output). It also computes the probabilities
of superior performance within environments (\code{conditional} output).
}
\details{
Probabilities provide the risk of recommending a selection candidate for a target
population of environments or for a specific environment. The function \code{prob_sup()}
computes the probabilities of superior performance and the probabilities of superior stability:

\itemize{\item Probability of superior performance}

Let \eqn{\Omega} represent the subset of selected genotypes based on their
performance across environments. A given genotype \eqn{j} will belong to \eqn{\Omega}
if its genotypic marginal value (\eqn{\hat{g}_j}) is high or low enough compared to
its peers. \code{prob_sup()} leverages the Monte Carlo discretized sampling
from the posterior distribution to emulate the occurrence of \eqn{S} trials. Then,
the probability of the \eqn{j^{th}} genotype belonging to \eqn{\Omega} is the
ratio of success (\eqn{\hat{g}_j \in \Omega}) events and the total number of sampled events,
as follows:

\deqn{Pr(\hat{g}_j \in \Omega \vert y) = \frac{1}{S}\sum_{s=1}^S{I(\hat{g}_j^{(s)} \in \Omega \vert y)}}

where \eqn{S} is the total number of samples (\eqn{s = 1, 2, ..., S}),
and \eqn{I(g_j^{(s)} \in \Omega \vert y)} is an indicator variable that can assume
two values: (1) if \eqn{\hat{g}_j^{(s)} \in \Omega} in the \eqn{s^{th}} sample,
and (0) otherwise. \eqn{S} is conditioned to the number of iterations and chains
previously set at \code{\link[=bayes_met]{bayes_met()}}.

Similarly, the conditional probability of superior performance can be applied to
individual environments. Let \eqn{\Omega_k} represent the subset of superior
genotypes in the \eqn{k^{th}} environment, so that the probability of the
\eqn{j^{th} \in \Omega_k} can calculated as follows:

\deqn{Pr(\hat{g}_{jk} \in \Omega_k \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y)}

where \eqn{I(\hat{g}_{jk}^{(s)} \in \Omega_k \vert y)} is an indicator variable
mapping success (1) if \eqn{\hat{g}_{jk}^{(s)}} exists in \eqn{\Omega_k}, and
failure (0) otherwise, and \eqn{\hat{g}_{jk}^{(s)} = \hat{g}_j^{(s)} + \widehat{ge}_{jk}^{(s)}}.
Note that when computing conditional probabilities (i.e., conditional to the
\eqn{k^{th}} environment or mega-environment), we are accounting for
the interaction of the \eqn{j^{th}} genotype with the \eqn{k^{th}}
environment.

The pairwise probabilities of superior performance can also be calculated across
or within environments. This metric assesses the probability of the \eqn{j^{th}}
genotype being superior to another experimental genotype or a commercial check.
The calculations are as follows, across and within environments, respectively:

\deqn{Pr(\hat{g}_{j} > \hat{g}_{j^\prime} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{j}^{(s)} > \hat{g}_{j^\prime}^{(s)} \vert y)}

or

\deqn{Pr(\hat{g}_{jk} > \hat{g}_{j^\prime k} \vert y) = \frac{1}{S} \sum_{s=1}^S I(\hat{g}_{jk}^{(s)} > \hat{g}_{j^\prime k}^{(s)} \vert y)}

These equations are set for when the selection direction is positive. If
\code{increase = FALSE}, \eqn{>} is simply switched by \eqn{<}.

\itemize{\item Probability of superior stability}

Probabilities of superior performance highlight experimental genotypes with
high agronomic stability. For ecological stability (invariance), the probability
of superior stability is the more adequate. Making a direct analogy with the
method of Shukla (1972), a stable genotype is the one that has a low variance
of the GEI (genotype-by-environment interaction) effects \eqn{[var(\widehat{ge})]}.
Using the same probability principles previously described, the probability
of superior stability is given as follows:

\deqn{Pr[var(\widehat{ge}_{jk}) \in \Omega \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y]}

where \eqn{I[var(\widehat{ge}_{jk}^{(s)}) \in \Omega \vert y]} indicates if
\eqn{var(\widehat{ge}_{jk}^{(s)})} exists in \eqn{\Omega} (1) or not (0).
Pairwise probabilities of superior stability are also possible in this context:

\deqn{Pr[var(\widehat{ge}_{jk}) < var(\widehat{ge}_{j^\prime k}) \vert y] = \frac{1}{S} \sum_{s=1}^S I[var(\widehat{ge}_{jk})^{(s)} < var(\widehat{ge}_{j^\prime k})^{(s)} \vert y]}

Note that \eqn{j} will be superior to \eqn{j^\prime} if it has a \strong{lower}
variance of the genotype-by-environment interaction effect. This is true regardless
if \code{increase} is set to \code{TRUE} or \code{FALSE}.

The joint probability independent events is the product of the individual probabilities.
The estimated genotypic main effects and the variances of GEI effects are independent
by design, thus the joint probability of superior performance and stability as follows:

\deqn{Pr[\hat{g}_j \in \Omega \cap var(\widehat{ge}_{jk}) \in \Omega] = Pr(\hat{g}_j \in \Omega) \times Pr[var(\widehat{ge}_{jk}) \in \Omega]}

The estimation of these probabilities are strictly related to some key questions that
constantly arises in plant breeding:
\itemize{
\item \strong{What is the risk of recommending a selection candidate for a target population of environments?}
\item \strong{What is the probability of a given selection candidate having good performance if
recommended to a target population of environments? And for a specific environment?}
\item \strong{What is the probability of a given selection candidate having better performance
than a cultivar check in the target population of environments? And in specific environments?}
\item \strong{How probable is it that a given selection candidate performs similarly across environments?}
\item \strong{What are the chances that a given selection candidate is more stable
than a cultivar check in the target population of environments?}
\item \strong{What is the probability that a given selection candidate having a
superior and invariable performance across environments?}
}

More details about the usage of \code{prob_sup}, as well as the other function of
the \code{ProbBreed} package can be found at \url{https://saulo-chaves.github.io/ProbBreed_site/}.
}
\examples{
\donttest{
mod = bayes_met(data = soy,
                gen = "Gen",
                env = "Env",
                repl = NULL,
                reg = "Reg",
                res.het = FALSE,
                trait = "Y",
                iter = 2000, cores = 1, chains = 4)

outs = extr_outs(data = soy, trait = "Y", gen = "Gen", model = mod,
                 effects = c('l','g','gl','m','gm'),
                 nenv = length(unique(soy$Env)),
                 probs = c(0.05, 0.95),
                 check.stan.diag = TRUE,
                 verbose = FALSE)

results = prob_sup(data = soy,
                   trait = "Y",
                   gen = "Gen",
                   env = "Env",
                   mod.output = outs,
                   reg = 'Reg',
                   int = .2,
                   increase = TRUE,
                   save.df = FALSE,
                   interactive = FALSE,
                   verbose = FALSE)
}

}
\references{
Dias, K. O. G, Santos J. P. R., Krause, M. D., Piepho H. -P., Guimarães, L. J. M.,
Pastina, M. M., and Garcia, A. A. F. (2022). Leveraging probability concepts
for cultivar recommendation in multi-environment trials. \emph{Theoretical and
Applied Genetics}, 133(2):443-455. \doi{10.1007/s00122-022-04041-y}

Shukla, G. K. (1972) Some statistical aspects of partioning genotype environmental
componentes of variability. \emph{Heredity}, 29:237-245. \doi{10.1038/hdy.1972.87}
}
