% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/kanji2.R
\name{kanjidist}
\alias{kanjidist}
\title{Compute distance between two kanjivec objects based on hierarchical optimal transport}
\usage{
kanjidist(
  k1,
  k2,
  compo_seg_depth1 = 3,
  compo_seg_depth2 = 3,
  p = 1,
  C = 0.2,
  type = c("rtt", "unbalanced", "balanced"),
  size = 48,
  lwd = 2.5,
  verbose = FALSE
)
}
\arguments{
\item{k1, k2}{two objects of type \code{kanjivec}.}

\item{compo_seg_depth1, compo_seg_depth2}{two integers \eqn{\geq 1}. Specifies for each kanji the
deepest level included for component matching. If 1, only the kanji itself is used.}

\item{p}{the order of the Wasserstein distance used for matching components. All distances and
the penalty (if any) are taken
to the \code{p}-th power (which is compensated by taking the \code{p}-th root after summation).}

\item{C}{the penalty for extra mass if \code{type} is \code{"rtt"} or \code{"unbalanced"}, i.e.
we add  \code{C^p} per unit of extra mass (before applying the \code{p}-th root).}

\item{type}{the type of Wasserstein distance used for matching components based on bitmaps drawn
from the stroke information in \code{k1} and \code{k2}. \code{"unbalanced"} means the pixel values
in the two images are interpreted as mass. The total masses can be very different. Extra mass can
be disposed of at cost \code{C^p} per unit. \code{"rtt"} is computationally the same, but the final
distance is divided by the maximum of the total ink in each kanji to the (1/p). \code{"balanced"}
means the pixel values are normalized so that both images have the same total mass 1. Everything has
to be transported, i.e.\ disposal of mass is not allowed.}

\item{size}{side length of the bitmaps used for matching components.}

\item{lwd}{linewidth for drawing the components in these bitmaps.}

\item{verbose}{logical. Whether to print detailed information on the cost for all pairs of
components and the final matching.}
}
\value{
The kanji distance, a non-negative number.
}
\description{
The kanji distance is based on matching hierarchical component structures in a
nesting-free way across all levels. The cost for matching individual components is a
cost for registering the components (i.e. alligning there position, scale and aspect
ratio) plus the (relative unbalanced) Wasserstein distance between the registered components.
}
\details{
For the precise definition and details see the reference below. Parameter \code{C}
corresponds to \eqn{b/2^{1/p}} in the paper.
}
\section{Warning}{


\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}\cr
The interface and details of this function will change in the future. Currently only a minimal
set of parameters can be passed. The other parameters are fixed exactly as in the
"prototype distance" (4.1) of the reference below for better or worse.\cr
There is a certain
tendency that exact matches of components are rather strongly favored (if the KanjiVG elements
agree this can overrule the unbalanced Wasserstein distance) and the penalties for
translation/scaling/distortion of components are somewhat mild.\cr
The computation time is rather high (depending on the settings and kanji up to several
seconds per kanji pair). This can be alleviated somewhat by keeping the \code{compo_seg_depth}
parameters at 3 or lower and setting \code{size = 32} (which goes well with \code{lwd=1.8}).\cr
Future versions will use a much faster line base optimal transport algorithm and further
speed-ups.
}

\examples{
if (requireNamespace("ROI.plugin.glpk")) {
  kanjidist(fivebetas[[4]], fivebetas[[5]])
  \donttest{kanjidist(fivebetas[[4]], fivebetas[[5]], verbose=TRUE)}
  # faster and similar:
  kanjidist(fivebetas[[4]], fivebetas[[5]], compo_seg_depth1=2, compo_seg_depth2=2, 
            size=32, lwd=1.8, verbose=TRUE) 
  # slower and similar:
  \donttest{kanjidist(fivebetas[[4]], fivebetas[[5]], size=64, lwd=3.2, verbose=TRUE)}
} 
}
\references{
Dominic Schuhmacher (2023).\cr
Distance maps between Japanese kanji characters based on hierarchical optimal transport.\cr
ArXiv Preprint, \doi{10.48550/arXiv.2304.02493}
}
\seealso{
\code{\link{kanjidistmat}}, \code{\link{kmatdist}}
}
