% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cleanup_data.R
\name{cleanup_data}
\alias{cleanup_data}
\title{Transforms 13C breath data into a clean format for fitting}
\usage{
cleanup_data(data, ...)
}
\arguments{
\item{data}{\itemize{
  \item{A data frame, array or tibble with at least two numeric columns
   with  optional names \code{minute} and \code{pdr} to fit 
   a single 13C record.  }
 \item{A data frame or tibble with three columns named \code{patient_id},
   \code{minute} and \code{pdr}. }
 \item{A matrix that can be converted to one of the above.}
 \item{A list of data frames/tibbles that are concatenated. When the list has 
   named elements, the names are converted to group labels. When the list elements
   are not named, group name \code{A} is used for all items.}
 \item{A structure of class \code{\link{breathtest_data}}, as imported from
   a file with \code{\link{read_any_breathtest}}}
 \item{A list of class \code{breathtest_data_list} as generated
 from read function such as \code{\link{read_breathid_xml}}}
}}

\item{...}{optional. 
\describe{
  \item{use_filename_as_patient_id}{Always use filename instead of 
  patient name. Use this when patient id are not unique.}
}}
}
\value{
A tibble with 4 columns. Column \code{patient_id} is created with a dummy
entry of \code{pat_a} if no patient_id was present in the input data set. 
A column \code{group} is required in the input data if the patients are from different 
treatment groups or within-subject repeats, e.g. in crossover design. 
A dummy group name "A" is added if no group column was available in the input data set.
If \code{group} is present, this is a hint to the analysis functions to do 
post-hoc breakdown or use it as a grouping variable in population-based methods.
A patient can have records in multiple groups, for example in a cross-over 
designs. 

Columns \code{minute} and \code{pdr} are the same as given on input, but negative
minute values are removed, and an entry at 0 minutes is shifted to 0.01 minutes 
because most fit methods cannot handle the singularity at t=0.

An error is raised if dummy columns \code{patient_id} and \code{group} cannot be 
added in a unique way, i.e. when multiple values for a given minute cannot be 
disambiguated.

Comments are persistent; multiple comments are concatenated with newline separators.
}
\description{
Accepts various data formats of ungrouped or grouped 13C breath 
test time series, and transforms these into a data frame that can be used by
all fitting functions, e.g. \code{\link{nls_fit}}.
If in doubt, pass data frame through \code{cleanup_data} before forwarding it 
to a fitting function. If the function cannot repair the format, it gives better
error messages than the \code{xxx_fit} functions.
}
\examples{
options(digits = 4)
# Full manual
minute = seq(0,30, by = 10)
data1 = data.frame(minute, 
   pdr = exp_beta(minute, dose = 100, m = 30,  k = 0.01, beta = 2))
# Two columns with data at t = 0
data1
# Four columns with data at t = 0.01
cleanup_data(data1)

# Results from simulate_breathtest_data can be passed directly to cleanup_data
cleanup_data(simulate_breathtest_data(3))
# .. which implicitly does
cleanup_data(simulate_breathtest_data(3)$data)

# Use simulated data
data2 = list(
  Z = simulate_breathtest_data(seed = 10)$data,
  Y = simulate_breathtest_data(seed = 11)$data)
d = cleanup_data(data2)
str(d)
unique(d$patient_id)
unique(d$group)
# "Z" "Y"

# Mix multiple input formats
f1 = btcore_file("350_20043_0_GER.txt")
f2 = btcore_file("IrisMulti.TXT")
f3 = btcore_file("IrisCSV.TXT")
# With a named list, the name is used as a group parameter
data = list(A = read_breathid(f1), B = read_iris(f2), C = read_iris_csv(f3)) 
d = cleanup_data(data)
str(d)
unique(d$patient_id)
# "350_20043_0_GER" "1871960"         "123456"
# File name is used as patient name if none is available
unique(d$group)
# "A" "B" "C"
}
