% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/step_kmedoids.R
\name{step_kmedoids}
\alias{step_kmedoids}
\alias{tunable.step_kmedoids}
\title{K-Medoids Clustering Variable Selection}
\usage{
step_kmedoids(
  recipe,
  ...,
  k = 5,
  center = TRUE,
  scale = TRUE,
  method = c("pam", "clara"),
  metric = "euclidean",
  optimize = FALSE,
  num_samp = 50,
  samp_size = 40 + 2 * k,
  replace = TRUE,
  prefix = "KMedoids",
  role = "predictor",
  skip = FALSE,
  id = recipes::rand_id("kmedoids")
)

tunable.step_kmedoids(x, ...)
}
\arguments{
\item{recipe}{\link[recipes]{recipe} object to which the step will be added.}

\item{...}{one or more selector functions to choose which variables will be
used to compute the components.  See \code{\link[recipes]{selections}} for
more details.  These are not currently used by the \code{tidy} method.}

\item{k}{number of k-medoids clusterings of the variables.  The value of
\code{k} is constrained to be between 1 and one less than the number of
original variables.}

\item{center, scale}{logicals indicating whether to mean center and median
absolute deviation scale the original variables prior to cluster
partitioning, or functions or names of functions for the centering and
scaling; not applied to selected variables.}

\item{method}{character string specifying one of the clustering methods
provided by the \pkg{cluster} package.  The \code{clara} (clustering
large applications) method is an extension of \code{pam} (partitioning
around medoids) designed to handle large datasets.}

\item{metric}{character string specifying the distance metric for calculating
dissimilarities between observations as \code{"euclidean"},
\code{"manhattan"}, or \code{"jaccard"} (\code{clara} only).}

\item{optimize}{logical indicator or 0:5 integer level specifying
optimization for the \code{\link[cluster]{pam}} clustering method.}

\item{num_samp}{number of sub-datasets to sample for the
\code{\link[cluster]{clara}} clustering method.}

\item{samp_size}{number of cases to include in each sub-dataset.}

\item{replace}{logical indicating whether to replace the original variables.}

\item{prefix}{if the original variables are not replaced, the selected
variables are added to the dataset with the character string prefix added
to their names; otherwise, the original variable names are retained.}

\item{role}{analysis role that added step variables should be assigned.  By
default, they are designated as model predictors.}

\item{skip}{logical indicating whether to skip the step when the recipe is
baked.  While all operations are baked when \code{\link[recipes]{prep}} is
run, some operations may not be applicable to new data (e.g. processing
outcome variables).  Care should be taken when using \code{skip = TRUE} as
it may affect the computations for subsequent operations.}

\item{id}{unique character string to identify the step.}

\item{x}{\code{step_kmedoids} object.}
}
\value{
Function \code{step_kmedoids} creates a new step whose class is of
the same name and inherits from \code{\link{step_sbf}}, adds it to the
sequence of existing steps (if any) in the recipe, and returns the updated
recipe.  For the \code{tidy} method, a tibble with columns \code{terms}
(selectors or variables selected), \code{cluster} assignments,
\code{selected} (logical indicator of selected cluster medoids),
\code{silhouette} (silhouette values), and \code{name} of the selected
variable names.
}
\description{
Creates a \emph{specification} of a recipe step that will partition numeric
variables according to k-medoids clustering and select the cluster medoids.
}
\details{
K-medoids clustering partitions variables into k groups such that the
dissimilarity between the variables and their assigned cluster medoids is
minimized.  Cluster medoids are then returned as a set of k variables.
}
\examples{
library(recipes)

rec <- recipe(rating ~ ., data = attitude)
kmedoids_rec <- rec \%>\%
  step_kmedoids(all_predictors(), k = 3)
kmedoids_prep <- prep(kmedoids_rec, training = attitude)
kmedoids_data <- bake(kmedoids_prep, attitude)

pairs(kmedoids_data, lower.panel = NULL)

tidy(kmedoids_rec, number = 1)
tidy(kmedoids_prep, number = 1)

}
\references{
Kaufman, L., & Rousseeuw, P. J. (1990). \emph{Finding groups in data: An
introduction to cluster analysis}. Wiley.

Reynolds, A., Richards, G., de la Iglesia, B., & Rayward-Smith, V. (1992).
Clustering rules: A comparison of partitioning and hierarchical clustering
algorithms. \emph{Journal of Mathematical Modelling and Algorithms},
\emph{5}, 475-504.
}
\seealso{
\code{\link[cluster]{pam}}, \code{\link[cluster]{clara}},
\code{\link[recipes]{recipe}}, \code{\link[recipes]{prep}},
\code{\link[recipes]{bake}}
}
