% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/estimationCKT.kernel.R
\name{CKT.kernel}
\alias{CKT.kernel}
\title{Estimation of conditional Kendall's tau using kernel smoothing}
\usage{
CKT.kernel(
  observedX1,
  observedX2,
  observedZ,
  newZ,
  h,
  kernel.name = "Epa",
  methodCV = "Kfolds",
  Kfolds = 5,
  nPairs = 10 * length(observedX1),
  typeEstCKT = "wdm",
  progressBar = TRUE
)
}
\arguments{
\item{observedX1}{a vector of n observations of the first variable}

\item{observedX2}{a vector of n observations of the second variable}

\item{observedZ}{a vector of n observations of the conditioning variable,
or a matrix with n rows of observations of the conditioning vector}

\item{newZ}{the new data of observations of Z at which
the conditional Kendall's tau should be estimated.}

\item{h}{the bandwidth used for kernel smoothing.
If this is a vector, then cross-validation is used following the method
given by argument \code{methodCV} to choose the best bandwidth
before doing the estimation.}

\item{kernel.name}{name of the kernel used for smoothing.
Possible choices are \code{"Gaussian"} (Gaussian kernel)
and \code{"Epa"} (Epanechnikov kernel).}

\item{methodCV}{method used for the cross-validation.
Possible choices are \code{"leave-one-out"} and \code{"Kfolds"}.}

\item{Kfolds}{number of subsamples used,
if \code{methodCV = "Kfolds"}.}

\item{nPairs}{number of pairs used in the cross-validation criteria,
if \code{methodCV = "leave-one-out"}.}

\item{typeEstCKT}{type of estimation of the conditional Kendall's tau.
Possible choices are \itemize{
  \item \code{1} and \code{3} produced biased estimators.
  \code{2} does not attain the full range \eqn{[-1,1]}.
  Therefore these 3 choices are not recommended for applications on real data.

  \item \code{4} is an improved version of \code{1,2,3} that has less bias
  and attains the full range \eqn{[-1,1]}.

  \item \code{"wdm"} is the default version and produces the same results
  as \code{4} when they are no ties in the data.
}}

\item{progressBar}{if \code{TRUE}, a progressbar for each h is displayed
to show the progress of the computation.}
}
\value{
a list with two components
\itemize{
   \item \code{estimatedCKT} the vector of size \code{NROW(newZ)}
   containing the values of the estimated conditional Kendall's tau.

   \item \code{finalh} the bandwidth \code{h} that was finally used
   for kernel smoothing (either the one specified by the user
   or the one chosen by cross-validation if multiple bandwidths were given.)
}
}
\description{
Let \eqn{X_1} and \eqn{X_2} be two random variables.
The goal of this function is to estimate the conditional Kendall's tau
(a dependence measure) between \eqn{X_1} and \eqn{X_2} given \eqn{Z=z}
for a conditioning variable \eqn{Z}.
Conditional Kendall's tau between \eqn{X_1} and \eqn{X_2} given \eqn{Z=z}
is defined as:
\deqn{P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) > 0 | Z_1 = Z_2 = z)}
\deqn{- P( (X_{1,1} - X_{2,1})(X_{1,2} - X_{2,2}) < 0 | Z_1 = Z_2 = z),}
where \eqn{(X_{1,1}, X_{1,2}, Z_1)} and \eqn{(X_{2,1}, X_{2,2}, Z_2)}
are two independent and identically distributed copies of \eqn{(X_1, X_2, Z)}.
For this, a kernel-based estimator is used, as described in
(Derumigny, & Fermanian (2019)).
}
\details{
\strong{Choice of the bandwidth \code{h}}.
The choice of the bandwidth must be done carefully.
In the univariate case, the default kernel (Epanechnikov kernel) has a support
on \eqn{[-1,1]}, so for a bandwidth \code{h}, estimation of conditional Kendall's
tau at \eqn{Z=z} will only use points for which \eqn{Z_i \in [z \pm h]}.
As usual in nonparametric estimation, \code{h} should not be too small
(to avoid having a too large variance) and should not be large
(to avoid having a too large bias).

We recommend that for each \eqn{z} for which the conditional Kendall's tau
\eqn{\tau_{X_1, X_2 | Z=z}} is estimated, the set
\eqn{\{i: Z_i \in [z \pm h] \}}
should contain at least 20 points and not more than 30\% of the points of
the whole dataset.
Note that for a consistent estimation, as the sample size \eqn{n} tends
to the infinity, \code{h} should tend to \eqn{0} while the size of the set
\eqn{\{i: Z_i \in [z \pm h]\}} should also tend to the infinity.
Indeed the conditioning points should be closer and closer to the point of interest \eqn{z}
(small \code{h}) and more and more numerous (\code{h} tending to 0 slowly enough).

In the multivariate case, similar recommendations can be made.
Because of the curse of dimensionality, a larger sample will be necessary to
reach the same level of precision as in the univariate case.
}
\examples{
# We simulate from a conditional copula
set.seed(1)
N = 800
Z = rnorm(n = N, mean = 5, sd = 2)
conditionalTau = -0.9 + 1.8 * pnorm(Z, mean = 5, sd = 2)
simCopula = VineCopula::BiCopSim(N=N , family = 1,
    par = VineCopula::BiCopTau2Par(1 , conditionalTau ))
X1 = qnorm(simCopula[,1])
X2 = qnorm(simCopula[,2])

newZ = seq(2,10,by = 0.1)
estimatedCKT_kernel <- CKT.kernel(
   observedX1 = X1, observedX2 = X2, observedZ = Z,
   newZ = newZ, h = 0.1, kernel.name = "Epa")$estimatedCKT

# Comparison between true Kendall's tau (in black)
# and estimated Kendall's tau (in red)
trueConditionalTau = -0.9 + 1.8 * pnorm(newZ, mean = 5, sd = 2)
plot(newZ, trueConditionalTau , col="black",
   type = "l", ylim = c(-1, 1))
lines(newZ, estimatedCKT_kernel, col = "red")

}
\references{
Derumigny, A., & Fermanian, J. D. (2019).
On kernel-based estimation of conditional Kendall’s tau:
finite-distance bounds and asymptotic behavior.
Dependence Modeling, 7(1), 292-321.
\doi{10.1515/demo-2019-0016}
}
\seealso{
\code{\link{CKT.estimate}} for other estimators
of conditional Kendall's tau.
\code{\link{CKTmatrix.kernel}} for a generalization of this function
when the conditioned vector is of dimension \code{d}
instead of dimension \code{2} here.

See \code{\link{CKT.hCV.l1out}} for manual selection of the bandwidth \code{h}
by leave-one-out or K-folds cross-validation.
}
