\name{generate.data}
\alias{generate.data}
\title{Generate random data from an oncogenetic tree}
\description{
  Generates random event occurrence data based on an oncogenetic
  tree model.
}
\usage{
generate.data(N, otree, with.errors=TRUE,
          edge.weights=if (with.errors) "estimated" else "observed",
          method=c("S","D1","D2"))
}
\arguments{
  \item{N}{The required sample size.}
  \item{otree}{An object of the class \code{oncotree}.}
  \item{with.errors}{A logical value specifying whether false
  positive and negative errors should be applied.} 
  \item{edge.weights}{A choice of whether the observed or estimated
  edge transition probabilities should be used in the calculation
  of probabilities. See \code{\link{oncotree.fit}} for explanation
  of the difference. By default, estimated edge transition probabilies
 if \code{with.errors=TRUE} and the observed ones if 
 \code{with.errors=FALSE}.}
  \item{method}{Simulation method, see Details for explanation of the options.}
} 
\details{
  There are three choices for the method of simulation; the best choice depends
  on the size of the tree, required sample size, and whether errors are needed.
  
  Method \dQuote{S} generates the data based on the conditional probability definition
  of the oncogenetic tree, and then \sQuote{corrupts} the resulting sample by 
  introducing random errors. This method is applicable in all circumstances, but can
  be slower than other methods if \code{N} is large and \code{with.errors=FALSE} 
  is used.
  
  Method \dQuote{D1} calculates the joint distribution generated by the
  tree exactly (using \code{\link{distribution.oncotree}}),
  and the observations are generated by sampling this distribution. Thus if 
  \code{with.errors=TRUE} and the tree is large, this method might fail due 
  to the exponential growth in the number of potential outcomes. On the 
  other hand, for a moderately sized tree and a large desired sample size
  \code{N} this is the most efficient method.
  
  Method \dQuote{D2} calculates the joint distribution generated by the tree without
  false positives/negatives, samples from it, and then \sQuote{corrupts} the 
  resulting sample. If  \code{with.errors=FALSE} is used then this method is 
  equivalent to method \dQuote{D1}.
  
}
\value{
  A data set where each row is an independent observation.
}
\author{ Aniko Szabo }
\seealso{\code{\link{oncotree.fit}}}
\examples{
   data(ov.cgh)
   ov.tree <- oncotree.fit(ov.cgh)
   
   set.seed(7365)
   rd <- generate.data(200, ov.tree, with.errors=TRUE)
   
   #compare timing of methods
   system.time(generate.data(20, ov.tree, with.errors=TRUE, method="S"))
   system.time(generate.data(20, ov.tree, with.errors=TRUE, method="D1"))
   system.time(generate.data(20, ov.tree, with.errors=TRUE, method="D2"))

}
\keyword{datagen}
\keyword{models}
