% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dis.kstest.r
\name{dis.kstest}
\alias{dis.kstest}
\title{The Monte Carlo estimate for the p-value of a discrete KS Test}
\usage{
dis.kstest(x, nsim = 100, bootstrap = TRUE, distri = "Poisson",
  r = NULL, p = NULL, alpha1 = NULL, alpha2 = NULL, n = NULL,
  lowerbound = 0.01, upperbound = 10000, parallel = FALSE)
}
\arguments{
\item{x}{A vector of count data. Should be non-negative integers. If elements of x are not integers, they will be
automatically rounded up to the smallest integers that are no less than themselves.}

\item{nsim}{The number of bootstrapped samples or simulated samples generated to compute p-value. If it is not an integer,
nsim will be automatically rounded up to the smallest integer that is no less than nsim. Should be greater than 30. Default is
 100.}

\item{bootstrap}{Whether to generate bootstrapped samples or not. See Details. 'TRUE' or any numeric non-zero value indicates
the generation of bootstrapped samples. The default is 'TRUE'.}

\item{distri}{The distribution used as the null hypothesis. Can be one of \{'poisson','nb','bb',
'bnb','zip','zinb','zibb', zibnb','ph','nbh','bbh','bnbh'\}, which corresponds to Poisson, negative binomial, beta binomial
and beta negative binomial distributions and their zero-inflated as well as hurdle versions, respectively. Default is 'Poisson'.}

\item{r}{An initial value of the number of success before which m failures are observed, where m is the element of x.
Must be a positive number, but not required to be an integer.}

\item{p}{An initial value of the probability of success, should be a positive value within (0,1).}

\item{alpha1}{An initial value for the first shape parameter of beta distribution. Should be a positive number.}

\item{alpha2}{An initial value for the second shape parameter of beta distribution. Should be a positive number.}

\item{n}{An initial value of the number of trials. Must be a positive number, but not required to be an integer.}

\item{lowerbound}{A lower searching bound used in the optimization of likelihood function. Should be a small positive number.
The default is 1e-2.}

\item{upperbound}{An upper searching bound used in the optimization of likelihood function. Should be a large positive number.
The default is 1e4.}

\item{parallel}{whether to use multiple threads to parallelize computation. Default is FALSE. Please aware that it may take
longer time to execute the program with \code{parallel=FALSE}.}
}
\value{
An object of class 'dis.kstest' including the following elements:
\itemize{
  \item{x: \code{x} used in computation.}
  \item{nsim: nsim used in computation.}
  \item{bootstrap: bootstrap used in computation.}
  \item{distri: distri used in computation..}
  \item{lowerbound: lowerbound used in computation.}
  \item{upperbound: upperboound used in computation.}
  \item{mle_new: A matrix of the maximum likelihood estimates of unknown parameters under the null distribution, using
  \eqn{nsim} bootstrapped or simulated samples.}
  \item{mle_ori: A row vector of the maximum likelihood estimates of unknown parameters under the null distribution, using the
  original data \code{x}.}
  \item{pvalue: Monte Carlo p-value of the one-sample KS test.}
  \item{N: length of \code{x}.}
  \item{r: initial value of r used in computation.}
  \item{p: initial value of p used in computation.}
  \item{alpha1: initial value of alpha1 used in computation.}
  \item{alpha2: initial value of alpha2 used in computation.}
  \item{n: initial value of n used in computation.}
}
}
\description{
Computes the Monte Carlo estimate for the p-value of a discrete one-sample Kolmogorov-Smirnov (KS) Test for Poisson, negative
binomial, beta binomial, beta negative binomial distributions and their zero-inflated as well as hurdle versions.
}
\details{
For arguments \code{nsim}, \code{bootstrap}, \code{distri}, if the length is larger than 1, only the first element will be used.
For other arguments except for \code{x}, the first valid value will be used if the input is not \code{NULL}, otherwise some
naive sample estimates will be fed into the algorithm. Note that only the initial values that are occurred in the null
distribution \code{distri} are needed. For example, with \code{distri=poisson}, user may provide a value for \code{lambda} but
not for \code{r} or \code{p}, though it won't disturb the algorithm.

With an output p-value less than some user-specified significance level, \code{x} is very likely from a distribution other
than the \code{distri}, given the current data. If p-values of more than one distributions are greater than the pre-specified
significance level, user may consider a following likelihood ratio test to select a 'better' distribution.

The methodology of computing Monte Carlo p-value is taken from Aldirawi et al. (2019). With \code{bootstrap=TRUE}, \code{nsim}
bootstrapped samples will be generated by resampling \code{x} without replacement. Otherwise, \code{nsim} samples are
simulated from the null distribution with the maximum likelihood estimate of original data \code{x}. Then compute the maximum
likelihood estimates of \code{nsim} bootstrapped or simulated samples, based on which \code{nsim} new samples are generated
under the null distribution. \code{nsim} KS statistics are calculated for the \code{nsim} new samples, then the Monte Carlo
p-value is resulted from comparing the \code{nsim} KS statistics and the statistic of original data \code{x}.

During the process of computing maximum likelihood estimates, the negative log likelihood function is minimized via basic R
 function \code{optim} with the searching interval decided by \code{lowerbound} and \code{upperbound}, except that the optimization
  of \code{p} takes \code{1-lowerbound} as the upper searching bound.

To accelerate the whole process, the algorithm uses the parallel strategy via the packages \code{foreach} and
  \code{doParallel}.
}
\section{Reference}{

\itemize{
 \item{H. Aldirawi, J. Yang, A. A. Metwally (2019). Identifying Appropriate Probabilistic Models for Sparse Discrete Omics Data,
 accepted for publication in 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).}
 \item{T. Wolodzko (2019). extraDistr: Additional Univariate and Multivariate Distributions, R package version 1.8.11,
  https://CRAN.R-project.org/package=extraDistr.}
 \item{R. Calaway, Microsoft Corporation, S. Weston, D. Tenenbaum (2017). doParallel: Foreach Parallel Adaptor for the 'parallel'
 Package, R package version 1.0.11, https://CRAN.R-project.org/package=doParallel.}
 \item{R. Calaway, Microsoft, S. Weston (2017). foreach: Provides Foreach Looping Construct for R, R package version 1.4.4,
  https://CRAN.R-project.org/package=foreach.}
}
}

\examples{

set.seed(2001)
temp1=sample.zi(N=300,phi=0.3,distri='poisson',lambda=5)
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='Poisson')$pvalue
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='nb')$pvalue
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='zip')$pvalue
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='zinb')$pvalue

}
\seealso{
\code{\link{model.lrt}}
}
