# Package:mvnimpute v0.0.0.9000

## Updating notes 12.08.2020:

Data structure use in the multiple imputation has been updated. The function names have been updated to reflect the functionalities of the functions. Updates will be described in more details in the following:

# `data.generation` function

This function requires several arguments including:

* `n`: number of observations/sample size of the generated data.
* `p`: number of variables/dimension of the generated data.
* `mu`: specified mean vector of the generated data.
* `Sigma`: specified covariance matrix of the generated data.
* `miss.pos`: indexes of the variables that have missing values.
* `miss.percent`: missing percentage for each variable that has missing values.
* `miss.type`: missing data mechanisms. Options include the default `"MCAR"` for *missing completely at random* (MCAR) and `"MAR"` for *missing at random* (MAR).
* `censor.pos`: indexes of the variables that have censored values.
* `censor.percent1`: percentage of the the data that are not missing. This value will be used for determining the lower limit of the censoring interval.
* `censor.percent2`: percentage of the the data that are not missing. This value will be used for determining the upper limit of the censoring interval.
* `censor.type`: type of censoring. Options include the default `"interval"` for interval censoring, `"right"` for right censoring, and `"left"` for left censoring.

This function generates multivariate normal data with missing and censored values. It can generate MCAR or MAR missing values, with interval, right or left-censored values. We assign a pair of value to each data point and a list of two elements to include those pairs. Additionally, we will generate a corresponding **data type indicator matrix** for visualization used by the `visual.plot` function. 

* First, we generate the multivariate normal data,
  + if the data are **observed**, we set the pairs are equal to each other, and they equal to the corresponding values of the generated multivariate normal data. The corresponding entries in the data type indicator matrix as 1.
  + if the data are **missing**, we treat those values as special cases as interval censored values, with censoring interval as $(-\infty, \infty)$. In this program, we take $-10^{10}$ and $10^{10}$ as proxies for $-\infty$ and $\infty$, respectively.
  + if the data are **censored**, we will treat all three censoring scenarios as "interval censoring". For left and right censored values, one limit of the censoring interval is infinite (i.e, $-10^{10}$), while for interval censoring, two limits are finite. We will select an interval from the specified variable(s) in the generated data according to `censor.percent1` and `censor.percent2`, and set the value pairs accordingly. For example, if we would like to generate right-censored values, only the lower limit of the interval should be finite, which means that `censor.percent1` should fall between 0 and 1, while `censor.percent2` is set to infinity ($-10^{10}$). And we set the lower limit of the interval as the value in the first element of the list, and $-10^{10}$ in the second element.
  
# `visual.plot` function

This function requires two arguments:

* `data.indicator`: the data type indicator generated by the `data.generation` function or manually created by the user based on any data.
* `title`: title of the generated plot, default is set to "Summary plot".

# `single.imputation` function

This function implements stochastic regression single imputation for making up the incomplete, in which the missing values are generated from the normal distribution with complete-case parameters, and censored values are generated from the truncated normal distributions withe CC parameters. It only requires the list that include the data as the input and will result in a complete data matrix. `multiple.imputation` function uses this function to prepare a complete data set for multiple imputation. Thus, we do not need to run this function separately.

# `multiple.imputation` function

This function implements the multiple imputation algorithm, it requires five arguments including:

* `data`: the list created by the `data.generation` function or created by the users according to their data.
* `prior.param`: list of prior parameter specification, please refer to the vignette for the specific format.
* `starting.values`: list of starting values to initial multiple imputation, please refer to the package vignette for the specific format.
* `iter`: number of rounds for running multiple imputation.
* `details`: Boolean variable of whether to print the running status.

Specifically, to make correct imputation decisions, the testing of the data type can be summarized as follows, calling the input data list as `incomplete.data`, for the $i$-th observation of the $j$-th variable:

* If `incomplete.data[[1]][i, j] == incomplete.data[[2]][i, j]`, then the value is observed, no imputation will be performed.
* If `incomplete.data[[1]][i, j] == -10e10 & incomplete.data[[2]][i, j] == 10e10`, the value is missing, and the imputed value will be generated from the normal distribution with the correct parameters.
* If `(incomplete.data[[1]][i, j] != incomplete.data[[2]][i, j]) & (incomplete.data[[1]][i, j] >= -10e10 & incomplete.data[[2]][i, j] <= 10e10`), the value is interval censored, and the imputed value will be generated from the truncated normal distribution with two sides truncated.
* If either of the value is infinite, while the other is finite, the value is left/right censored, and the imputed value will be generated from the truncated normal distribution with one side truncated.

The results of the function includes:

* `simulated.mu`: simulated mean vector across iterations.
* `simulated.sig`: simulated variance for each variable across iterations.
* `simulated.cov`: simulated covariance matrix across iterations.
* `imputa.dat`: imputed data set across iterations.
* `conditional.params`: regression parameters by the sweep operator across iterations。

**NOTE: The censoring types are all Type I censoring.**

## Updating notes: 01.07.2021:
* Update README.md file, add instruction section subtitle "first-time users" and "Users who have already installed the package".
* Set default argument values for the `data.generation` function.


