% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clean.R
\name{clean}
\alias{clean}
\title{Dataframe cleaning for missing data handling}
\usage{
clean(
  X,
  var_remove = NULL,
  var_removal_threshold = 0.5,
  ind_removal_threshold = 1,
  missingness_coding = NA
)
}
\arguments{
\item{X}{Original dataframe with samples in rows and variables as columns}

\item{var_remove}{Variables to remove (e.g. ID). Define by character vector, e.g. c('ID', 'character_variable')}

\item{var_removal_threshold}{Variable removal threshold with default 0.5 (range between 0 and 1). Variables (columns) above this missingness fraction will be removed during the cleaning process}

\item{ind_removal_threshold}{Individual removal threshold with default 1 (range between 0 and 1). Individuals (rows) above this missingness fraction will be removed during the cleaning process}

\item{missingness_coding}{Non NA coding in original dataframe that should be changed to NA (e.g. -9). Can take a single value (define by: missingness_coding = -9) or multiple values (define by: missingness_coding = c(-9, -99, -999))}
}
\value{
Clean dataset with NAs as missing values and rows/columns above the pre-specified missingness thresholds removed
}
\description{
\code{\link{clean}} helps in the conversion of missing values, variable types and removes rows and columns above
pre-specified missingness
}
\details{
For better imputation performance, a clean, filtered dataframe is needed. Variables and samples with very high
missingness fractions will negatively impact most missing data imputation algorithms. This function cleans the original
dataframe by removing rows (samples) and columns (variables) above pre-specified missingness thresholds. The function
will also convert any prespecified, strangely coded missing data to NAs. Note that all factor variables will
be converted or coerced to numeric variables.
}
\examples{
# basic settings
cleaned <- clean(clindata_miss, missingness_coding = -9)

# setting very conservative removal thresholds
cleaned <- clean(clindata_miss,
                 var_removal_threshold = 0.10,
                 ind_removal_threshold = 0.9,
                 missingness_coding = -9)

}
