% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/StatTools.R
\name{getClusters}
\alias{getClusters}
\title{getClusters}
\usage{
getClusters(data, method = "hca", ...)
}
\arguments{
\item{data}{the matrix including the integrations of the areas defined by the buckets (columns) 
on each spectrum (rows)}

\item{method}{Clustering method of the buckets. Either 'corr' for 'correlation' or 'hca' for 
'hierarchical clustering analysis'.}

\item{...}{Depending on the chosen method:
\itemize{
  \item \code{corr} : cval, dC, ncpu
  \item \code{hca} :  vcutusr
}}
}
\value{
\code{getClusters} returns a list containing the following components:
\itemize{
  \item \code{vstats} Statistics that served to find the best value of the criterion (matrix)
  \item \code{clusters} List of  the ppm value corresponding to each cluster. the length of the list equal to number of clusters
  \item \code{clustertab} the associations matrix that gives for each cluster (column 2) the corresponding buckets (column 1)
  \item \code{params} List of parameters related to the chosen method for which the clustering was performed.
  \item \code{vcrit} Value of the (best/user) criterion, i.e correlation threshold for 'corr' method or the cut value for the 'hca' method.
  \item \code{indxopt} Index value within the vstats matrix corresponding to the criterion value (vcrit)
}
}
\description{
From the data matrix generated from the integration of all bucket zones (columns) for each 
spectrum (rows), we can take advantage of the concentration variability of each compound in 
a series of samples by performing a clustering based on significant correlations that link 
these buckets together into clusters. Bucket Clustering based on either a lower threshold  
applied on correlations or a cutting value applied on a hierarchical tree of the variables 
(buckets) generated by an Hierarchical Clustering Analysis (HCA).
}
\details{
At the bucketing step (see above),  we have chosen the intelligent bucketing, it means 
that each bucket exact matches with one resonance peak. Thanks to this, the buckets now 
have a strong chemical meaning, since the resonance peaks are the fingerprints of 
chemical compounds. However, to assign a chemical compound, several resonance peaks 
are generally required in 1D 1 H-NMR metabolic profiling. To generate relevant 
clusters (i.e. clusters possibly matching to chemical compounds), two approaches 
have been implemented:
\itemize{
  \item Bucket Clustering based on a lower threshold  applied on correlations
  \itemize{
      \item In this approach an appropriate correlation threshold is applied on the correlation 
matrix before its cluster decomposition. Moreover, an improvement can be done by searching for 
a trade-off on a tolerance interval of the correlation threshold : from a fixed threshold of the 
correlation (cval), the clustering is calculated for the three values (cval-dC, cval, cval+dC), 
where dC is the tolerance interval of the correlation threshold. From these three sets of 
clusters, we establish a merger according to the following rules: 1) if a large cluster is 
broken, we keep the two resulting clusters. 2) If a small cluster disappears, the initial 
cluster is conserved. Generally, an interval of the correlation threshold included between 
0.002 and 0.01 gives good trade-off.
  }
  \item Bucket Clustering based on a hierarchical tree of the variables (buckets) generated 
by an Hierarchical Clustering Analysis (HCA)
  \itemize{
      \item In this approach a Hierachical Classification Analysis (HCA, \code{\link[stats]{hclust}}) 
is applied on the data after calculating a matrix distance ("euclidian" by default). Then, a cut 
is applied on the tree (\code{\link[stats]{cutree}}) resulting from \code{\link[stats]{hclust}}, 
into several groups by specifying the cut height(s). For finding best cut value,  the cut height 
is chosen i) by testing several values equally spaced in a given range of the cut height, then, 
2) by keeping the one that gives the more cluster and by including most bucket variables. 
Otherwise, a cut value has to be specified by the user (vcutusr)
  }
}
}
\examples{
 \donttest{
  data_dir <- system.file("extra", package = "Rnmr1D")
  cmdfile <- file.path(data_dir, "NP_macro_cmd.txt")
  samplefile <- file.path(data_dir, "Samples.txt")
  out <- Rnmr1D::doProcessing(data_dir, cmdfile=cmdfile, 
                                samplefile=samplefile, ncpu=2)
  outMat <- getBucketsDataset(out, norm_meth='CSN')
  clustcorr <- getClusters(outMat, method='corr', cval=0, dC=0.003, ncpu=2)
  clusthca <- getClusters(outMat, method='hca', vcutusr=0)
 }
}
\references{
{
  Jacob D., Deborde C. and Moing A. (2013) An efficient spectra processing method for metabolite identification from 1H-NMR metabolomics data.
  Analytical and Bioanalytical Chemistry 405(15) 5049-5061 doi: 10.1007/s00216-013-6852-y
 }
}
