% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Data_Gen.R
\name{Data_Gen}
\alias{Data_Gen}
\title{Generation of Artificial Data}
\usage{
Data_Gen(
  X,
  alpha,
  beta,
  theta,
  a,
  sigma_e,
  e_distr = "normal",
  num_pi,
  delta,
  linearY,
  typeY
)
}
\arguments{
\item{X}{The input of n x p dimensional matrix of true covariates, where n is
sample size and p is number of covariates. Users can customize the data structure
and distribution.}

\item{alpha}{A vector of the parameters that reflects the relationship between
treatment model and covariates. The dimension of \code{alpha} should be equal to the
dimension of \code{beta}. If \code{alpha} and \code{beta} have the same nonzero
components, then we call them Xc (covariates associated with both outcome and treatment).
If components in \code{alpha} are zero but the same components in \code{beta} are
nonzero, we call them Xp (covariates associated with outcome only), If components
in \code{alpha} are nonzero but the same components in \code{beta} are zero, we
call them Xi (covariates associated with treatment only).
For example, if \code{alpha = c(2,2,0,0,1,1)} and \code{beta = c(3,3,1,1,0,0)}, then
the first two components are Xc, the middle two components are Xp, and the last two
components are Xi.}

\item{beta}{A vector of the parameters that reflects the relationship between
outcome and covariates. The dimension of \code{alpha} should be equal to the
dimension of \code{beta}. If \code{alpha} and \code{beta} have the same nonzero
components, then we call them Xc (covariates associated with both outcome and treatment).
If components in \code{alpha} are zero but the same components in \code{beta} are
nonzero, we call them Xp (covariates associated with outcome only), If components
in \code{alpha} are nonzero but the same components in \code{beta} are zero, we
call them Xi (covariates associated with treatment only).
For example, if \code{alpha = c(2,2,0,0,1,1)} and \code{beta = c(3,3,1,1,0,0)}, then
the first two components are Xc, the middle two components are Xp, and the last two
components are Xi.}

\item{theta}{The scalar of the parameter used to link outcome and
treatment.}

\item{a}{A weight of \code{cov_e} in the measurement error model
W = cov_e*a + X + e, where W is observed covariates with measurement error,
X is actual covariates, and e is noise term with covaraince matrix \code{cov_e}.}

\item{sigma_e}{\code{sigma_e} is the common diagonal entries of covariance
matrix in the measurement error model.}

\item{e_distr}{Distribution of the noise term in the classical measurement
error model. The input "normal" refers to the normal distribution with mean
zero and covariance matrix with diagonal entries \code{sigma_e}. The scalar input "v"
represents t-distribution with degree of freedom v.}

\item{num_pi}{Settings of misclassification probability with option 1 or 2.
\code{num_pi = 1} gives that pi_01 equals  pi_10, and \code{num_pi = 2} refers to that
pi_01 is not equal to pi_10.}

\item{delta}{The parameter that determines number of treatment with measurement
error. \code{delta = 1} has equal number of treatment with and without measurement
error. We set \code{default = 0.5} since it has smaller number of treatment who has
measurement error.}

\item{linearY}{The boolean option that determines the relationship between
outcome and covariates. \code{linearY = TRUE} gives linear relationship with a
vector of parameters \code{alpha}, \code{linearY = FALSE} refers to non linear
relationship between outcome and covariates, where the sin function is specified on
Xc and the exponential function is specified on Xp.}

\item{typeY}{The outcome variable with exponential family distribution
"binary", "pois" and "cont". \code{typeY = "binary"} refers to binary random
variables, \code{typeY = "pois"} refers to Poisson random variables, and
\code{typeY = "cont"} refers to normally distributed random variables.}
}
\value{
Data A n x (p+2) matrix of the original data without measurement error,
where n is sample size and the first p columns are covariates with the order being
Xc (the covariates associated with both treatment and outcome),
Xp (the covariates associated with outcome only),
Xi (the covariates associated with treatment only),
Xs (the covariates independent of outcome and treatment),
the last second column is treatment, and the last column is outcome.

Error_Data A n x (p+2) matrix of the data with measurement error in covariates
and treatment, where n is sample size and the first p columns are covariates
with the order being
Xc (the covariates associated with both treatment and outcome),
Xp (the covariates associated with outcome only),
Xi (the covariates associated with treatment only),
Xs (the covariates independent of outcome and treatment),
the last second column is treatment, and the last column is outcome.

Pi A n x 2 matrix containing two misclassification probabilities pi_10 =
P(Observed Treatment = 1 | Actual Treatment = 0) and pi_01 =
P(Observed Treatment = 0 | Actual Treatment = 1) in columns.

cov_e A covariance matrix of the measurement error model.
}
\description{
This function shows the demonstration of data generation based on some
specific and commonly used settings, including exponential family distributed
potential outcomes, error-prone treatments, and covariates. In this function,
users can specify different magnitudes of measurement error and relationship
between outcome, treatment, and covariates.
}
\examples{
##### Example 1: A multivariate normal continuous X with linear normal Y #####

## Generate a multivariate normal X matrix
mean_x = 0; sig_x = 1; rho = 0
Sigma_x = matrix( rho*sig_x^2,nrow=120 ,ncol=120 )
diag(Sigma_x) = sig_x^2
Mean_x = rep( mean_x, 120 )
X = as.matrix( mvrnorm(n = 60,mu = Mean_x,Sigma = Sigma_x,empirical = FALSE) )

## Data generation setting
## alpha: Xc's scale is 0.2 0.2 and Xi's scale is 0.3 0.3
## so this refers that there is 2 Xc and Xi
## beta: Xc's scale is 2 2 and Xp's scale is 2 2
## so this refers that there is 2 Xc and Xp
## rest with following setup
Data_fun <- Data_Gen(X, alpha = c(0.2,0.2,0,0,0.3,0.3), beta = c(2,2,2,2,0,0)
, theta = 2, a = 2, sigma_e = 0.75, e_distr = 10, num_pi = 1, delta = 0.8,
linearY = TRUE, typeY = "cont")

##### Example 2: A uniform X with non linear binary Y #####

## Generate a uniform X matrix
n = 50; p = 120
X = matrix(NA,n,p)
for( i in 1:p ){ X[,i] = sample(runif(n,-1,1),n,replace=TRUE ) }
X = scale(X)

## Data generation setting
## alpha: Xc's scale is 0.1 and Xi's scale is 0.3
## so this refers that there is 1 Xc and Xi
## beta: Xc's scale is 2 and Xp's scale is 3
## so this refers that there is 1 Xc and Xp
## rest with following setup
Data_fun <- Data_Gen(X, alpha = c(0.1,0,0.3), beta = c(2,3,0)
, theta = 1, a = 2, sigma_e = 0.5, e_distr = "normal", num_pi = 2, delta = 0.5,
linearY = FALSE, typeY = "binary")

}
