% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/duplicate-detect.R
\name{duplicate_detect}
\alias{duplicate_detect}
\title{Detect duplicate values}
\usage{
duplicate_detect(x, numeric_only = TRUE, colname_end = "dup")
}
\arguments{
\item{x}{Vector or data frame.}

\item{numeric_only}{Boolean. If \code{TRUE} (the default) and if \code{x} is a data
frame, the function will only test numeric columns and string columns
coercible to numeric. \emph{Note}: Be careful when setting it to \code{FALSE}. This
can lead to all kinds of coercion issues.}

\item{colname_end}{String. Name ending of the Boolean test result columns.
Default is \code{"dup"}.}
}
\value{
A tibble (data frame) —
\itemize{
\item If \code{x} is a vector, there are two columns: the input \code{value} and the
Boolean \code{has_duplicates}.
\item If \code{x} is a data frame, the output tibble has (some of) the columns from
\code{x}, and to each of these columns' right, the corresponding Boolean
column with an index value.
}

The tibble has the \code{scr_dup_detect} class, which is recognized by the
\code{audit()} generic.
}
\description{
For every value in a vector or data frame, \code{duplicate_detect()}
tests whether there is at least one identical value. Test results are
presented next to every value.

By default, only numeric columns and string columns coercible to numeric
are tested (if \code{x} is a data frame). Any other columns are silently
dropped.

This function is a blunt tool designed for initial data checking. Don't put
too much weight on its results.

For summary statistics, call \code{audit()} on the results.
}
\details{
This function is not very informative with many input values that
only have a few characters each. Many of them may have duplicates just by
chance. For example, in R's built-in \code{iris} data set, 99\% of values have
duplicates.

In general, the fewer values and the more characters per value there are,
the more significant \code{duplicate_detect()}'s results will be.
}
\section{Summaries with \code{audit()}}{
 There is an S3 method for the \code{audit()}
generic, so you can call \code{audit()} following \code{duplicate_detect()}. It
returns a tibble with these columns ---
\itemize{
\item \code{variable}: The original data frame's variables with at least one
"duplicated" value: one that has at least one duplicate anywhere else in
the data frame. For a vector, \code{x}.
\item \code{n_duplicated}: Number of "duplicated" values of that variable: those
that have at least one duplicate anywhere in the data frame.
\item \code{dup_rate}: Rate of "duplicated" values of that variable.
}

The final row, \code{.total}, summarizes across all other rows: It adds up the
\code{n_duplicated} and \code{n_total} columns, and calculates the average of the
\code{dup_rate} column.
}

\examples{
# Find duplicate values in a data frame...
duplicate_detect(x = pigs4)

# ...or in a single vector:
duplicate_detect(x = pigs4$snout)

# Summary statistics with `audit()`:
pigs4 \%>\%
  duplicate_detect() \%>\%
  audit()

# If there are many values and/or few
# characters per value, `duplicate_detect()`
# can be misleading:
iris \%>\%
  duplicate_detect()

iris \%>\%
  duplicate_detect() \%>\%
  audit()
}
\seealso{
\code{duplicate_count()} provides a frequency table.
}
