% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mcmc-kernels.R
\name{mcmc_simple_step_size_adaptation}
\alias{mcmc_simple_step_size_adaptation}
\title{Adapts the inner kernel's \code{step_size} based on \code{log_accept_prob}.}
\usage{
mcmc_simple_step_size_adaptation(
  inner_kernel,
  num_adaptation_steps,
  target_accept_prob = 0.75,
  adaptation_rate = 0.01,
  step_size_setter_fn = NULL,
  step_size_getter_fn = NULL,
  log_accept_prob_getter_fn = NULL,
  validate_args = FALSE,
  name = NULL
)
}
\arguments{
\item{inner_kernel}{\code{TransitionKernel}-like object.}

\item{num_adaptation_steps}{Scalar \code{integer} \code{Tensor} number of initial steps to
during which to adjust the step size. This may be greater, less than, or
equal to the number of burnin steps.}

\item{target_accept_prob}{A floating point \code{Tensor} representing desired
acceptance probability. Must be a positive number less than 1. This can
either be a scalar, or have shape \code{list(num_chains)}. Default value: \code{0.75}
(the center of asymptotically optimal rate for HMC).}

\item{adaptation_rate}{\code{Tensor} representing amount to scale the current
\code{step_size}.}

\item{step_size_setter_fn}{A function with the signature
\verb{(kernel_results, new_step_size) -> new_kernel_results} where
\code{kernel_results} are the results of the \code{inner_kernel}, \code{new_step_size}
is a \code{Tensor} or a nested collection of \code{Tensor}s with the same
structure as returned by the \code{step_size_getter_fn}, and
\code{new_kernel_results} are a copy of \code{kernel_results} with the step
size(s) set.}

\item{step_size_getter_fn}{A function with the signature
\code{(kernel_results) -> step_size} where \code{kernel_results} are the results
of the \code{inner_kernel}, and \code{step_size} is a floating point \code{Tensor} or a
nested collection of such \code{Tensor}s.}

\item{log_accept_prob_getter_fn}{A function with the signature
\code{(kernel_results) -> log_accept_prob} where \code{kernel_results} are the
results of the \code{inner_kernel}, and \code{log_accept_prob} is a floating point
\code{Tensor}. \code{log_accept_prob} can either be a scalar, or have shape
\code{list(num_chains)}. If it's the latter, \code{step_size} should also have the same
leading dimension.}

\item{validate_args}{\code{Logical}. When \code{True} kernel parameters are checked
for validity. When \code{False} invalid inputs may silently render incorrect
outputs.}

\item{name}{string prefixed to Ops created by this class. Default: "simple_step_size_adaptation".}
}
\value{
a Monte Carlo sampling kernel
}
\description{
The simple policy multiplicatively increases or decreases the \code{step_size} of
the inner kernel based on the value of \code{log_accept_prob}. It is based on
equation 19 of Andrieu and Thoms (2008). Given enough steps and small
enough \code{adaptation_rate} the median of the distribution of the acceptance
probability will converge to the \code{target_accept_prob}. A good target
acceptance probability depends on the inner kernel. If this kernel is
\code{HamiltonianMonteCarlo}, then 0.6-0.9 is a good range to aim for. For
\code{RandomWalkMetropolis} this should be closer to 0.25. See the individual
kernels' docstrings for guidance.
}
\details{
In general, adaptation prevents the chain from reaching a stationary
distribution, so obtaining consistent samples requires \code{num_adaptation_steps}
be set to a value somewhat smaller than the number of burnin steps.
However, it may sometimes be helpful to set \code{num_adaptation_steps} to a larger
value during development in order to inspect the behavior of the chain during
adaptation.

The step size is assumed to broadcast with the chain state, potentially having
leading dimensions corresponding to multiple chains. When there are fewer of
those leading dimensions than there are chain dimensions, the corresponding
dimensions in the \code{log_accept_prob} are averaged (in the direct space, rather
than the log space) before being used to adjust the step size. This means that
this kernel can do both cross-chain adaptation, or per-chain step size
adaptation, depending on the shape of the step size.

For example, if your problem has a state with shape \verb{[S]}, your chain state
has shape \verb{[C0, C1, Y]} (meaning that there are \code{C0 * C1} total chains) and
\code{log_accept_prob} has shape \verb{[C0, C1]} (one acceptance probability per chain),
then depending on the shape of the step size, the following will happen:
\itemize{
\item Step size has shape \verb{[]}, \verb{[S]} or \verb{[1]}, the \code{log_accept_prob} will be averaged
across its \code{C0} and \code{C1} dimensions. This means that you will learn a shared
step size based on the mean acceptance probability across all chains. This
can be useful if you don't have a lot of steps to adapt and want to average
away the noise.
\item Step size has shape \verb{[C1, 1]} or \verb{[C1, S]}, the \code{log_accept_prob} will be
averaged across its \code{C0} dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the \code{C1} dimension. This can be useful when the \code{C1}
dimension indexes different distributions, while \code{C0} indexes replicas of a
single distribution, all sampled in parallel.
\item Step size has shape \verb{[C0, C1, 1]} or \verb{[C0, C1, S]}, then no averaging will
happen. This means that each chain will learn its own step size. This can be
useful when all chains are sampling from different distributions. Even when
all chains are for the same distribution, this can help during the initial
warmup period.
\item Step size has shape \verb{[C0, 1, 1]} or \verb{[C0, 1, S]}, the \code{log_accept_prob} will be
averaged across its \code{C1} dimension. This means that you will learn a shared
step size based on the mean acceptance probability across chains that share
the coordinate across the \code{C0} dimension. This can be useful when the \code{C0}
dimension indexes different distributions, while \code{C1} indexes replicas of a
single distribution, all sampled in parallel.
}
}
\section{References}{

\itemize{
\item \href{https://people.eecs.berkeley.edu/~jordan/sail/readings/andrieu-thoms.pdf}{Andrieu, Christophe, Thoms, Johannes. A tutorial on adaptive MCMC. \emph{Statistics and Computing}, 2008.}
\item http://andrewgelman.com/2017/12/15/burn-vs-warm-iterative-simulation-algorithms/#comment-627745
\item \href{http://arxiv.org/abs/1411.6669}{Betancourt, M. J., Byrne, S., & Girolami, M. (2014). \emph{Optimizing The Integrator Step Size for Hamiltonian Monte Carlo}.}
}
}

\examples{
\donttest{
  target_log_prob_fn <- tfd_normal(loc = 0, scale = 1)$log_prob
  num_burnin_steps <- 500
  num_results <- 500
  num_chains <- 64L
  step_size <- tf$fill(list(num_chains), 0.1)

  kernel <- mcmc_hamiltonian_monte_carlo(
    target_log_prob_fn = target_log_prob_fn,
    num_leapfrog_steps = 2,
    step_size = step_size
  ) \%>\%
    mcmc_simple_step_size_adaptation(num_adaptation_steps = round(num_burnin_steps * 0.8))

  res <- kernel \%>\% mcmc_sample_chain(
    num_results = num_results,
    num_burnin_steps = num_burnin_steps,
    current_state = rep(0, num_chains),
    trace_fn = function(x, pkr) {
      list (
        pkr$inner_results$accepted_results$step_size,
        pkr$inner_results$log_accept_ratio
      )
    }
  )

  samples <- res$all_states
  step_size <- res$trace[[1]]
  log_accept_ratio <- res$trace[[2]]
}

}
\seealso{
Other mcmc_kernels: 
\code{\link{mcmc_dual_averaging_step_size_adaptation}()},
\code{\link{mcmc_hamiltonian_monte_carlo}()},
\code{\link{mcmc_metropolis_adjusted_langevin_algorithm}()},
\code{\link{mcmc_metropolis_hastings}()},
\code{\link{mcmc_no_u_turn_sampler}()},
\code{\link{mcmc_random_walk_metropolis}()},
\code{\link{mcmc_replica_exchange_mc}()},
\code{\link{mcmc_slice_sampler}()},
\code{\link{mcmc_transformed_transition_kernel}()},
\code{\link{mcmc_uncalibrated_hamiltonian_monte_carlo}()},
\code{\link{mcmc_uncalibrated_langevin}()},
\code{\link{mcmc_uncalibrated_random_walk}()}
}
\concept{mcmc_kernels}
