% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/gapfill.R
\name{Gapfill}
\alias{Gap-fill}
\alias{Gapfill}
\alias{gap-fill}
\alias{gapfill}
\title{Main Function for Gap-Filling}
\usage{
Gapfill(data, fnSubset = Subset, fnPredict = Predict, iMax = Inf,
  nPredict = 1L, subset = "missings", clipRange = c(-Inf, Inf),
  dopar = FALSE, verbose = TRUE, ...)
}
\arguments{
\item{data}{Numeric array with four dimensions. The input (satellite) data to be gap-filled.
Missing values should be encoded as \code{NA}. When using the default \code{\link{Subset}} and \code{\link{Predict}}
functions, the data should have the dimensions: x coordinate, y coordinate, seasonal index (e.g., day of the year), and year.
See the \code{\link{ndvi}} dataset for an example.}

\item{fnSubset}{Function to subset the \code{data} around a missing value.
See \code{\link{Subset}} and \link{Extend} for more information.}

\item{fnPredict}{Function to predict a missing value based on the return value of \code{fnSubset}.
See \code{\link{Predict}} and \link{Extend} for more information.}

\item{iMax}{Integer vector of length 1.
The maximum number of iterations until \code{NA} is returned as predicted value.}

\item{nPredict}{Integer vector of length 1. Specifies the length of the vector returned from \code{fnPredict}.
Values larger than 1 may increase memory usage considerably.}

\item{subset}{If \code{"missing"}, all missing values in \code{data} are filled.
If a logical array of the same dimensions as \code{data} or 
a vector with positive integers, only the missing elements of \code{data[subset]} are filled.
Note that this does not the same effect as selecting a subset of the input data, since
independent of the specified subset, all values in \code{data} are used to inform the predictions.}

\item{clipRange}{Numeric vector of length 2.
Specifies the lower and the upper bound of the filled data.
Values outside this range are clipped accordingly.
If \code{nPredict} is larger than 2, only the first return value of \code{fnPredict} will be clipped.}

\item{dopar}{Logical vector of length 1.
If \code{TRUE}, the \code{\%dopar\%} construct from the R package foreach is used.
This allows the function to predict several missing values in parallel,
if a parallel back-end (e.g., from the R package doParallel or doMpi) is available.
See the example below and \code{\link[foreach]{foreach}} for more information.}

\item{verbose}{Logical vector of length 1.
If \code{TRUE}, messages are printed.}

\item{...}{Additional arguments passed to \code{fnSubset} and \code{fnPredict}.}
}
\value{
List of length 4 with the entries:
\itemize{
 \item{\code{fill}}{ contains the gap-filled data.
If \code{nPredict = 1}, \code{fill} is an array of dimension \code{dim(data)},
otherwise the array is of dimension \code{c(dim(data), nPredict)}.}
 \item{\code{mps}}{ integer vector of length equaling the number of predicted values.
Contains the (1 dimensional) indices of the predicted values.}
\item{\code{time}}{ list of length 4 containing timing information.
                    \itemize{\item{\code{start}}{ start date and time.}
                             \item{\code{end}}{ end date and time.}
                             \item{\code{elapsedMins}}{ elapsed minutes.}
                             \item{\code{elapsedSecsPerNA}}{ elapsed seconds per predicted value.}
                             }
                   }
 \item{\code{call}}{ call used to produce the object.}
}
}
\description{
The function fills (predicts) missing values in satellite data.
We illustrate it with MODIS NDVI data,
but it can also be applied to other data, that is recorded at equally spaced points in time.
Moreover, the function provides infrastructure for the development of new gap-fill algorithms.
The predictions of the missing values are based on a subset-predict procedure, i.e.,
each missing value is predicted separately by
(1) selecting a subset of the data to a neighborhood around the missing value and
(2) predicting the values based on that subset.
}
\details{
The predictions of the missing values are based on a subset-predict procedure, i.e.,
each missing value is predicted separately by
(1) selecting a subset of the data to a
neighborhood around it and (2) predicting the values based on
that subset. The following gives more information on this subset-predict strategy.\cr
Missing values are often unevenly distributed in \code{data}.
Therefore, the size of a reasonable subset may be different depending on the position of the considered missing value.  
The search strategy to find that subset is encoded in \code{fnSubset}.
The function returns different subsets depending on the argument \code{i}.
The decision whether a subset is suitable and the prediction itself is
implemented in \code{fnPredict}.
To be more specific, the subset-predict procedure loops over the following two steps to predict one missing value:
\describe{
\item{(1) }{The function \code{fnSubset} is provided with the argument \code{i = i} (where \code{i <- 0} in the first iteration) and
returns a subset around the missing value.}
\item{(2) }{The function \code{fnPredict} decides whether the subset contains enough information to predict the missing value.
If so, the predicted value is returned.
Otherwise, the function returns \code{NA} and the algorithm increases \code{i} by one (\code{i <- i + 1})
before continuing with step (1).}
}
The procedure stops if one of the following criteria is met:
\itemize{
\item \code{fnPredict} returns a non-\code{NA} value,
\item \code{iMax} tries have been completed,
\item \code{fnSubset} returns the same subset two times in a row. 
}
}
\note{
The default \code{\link{Predict}} function implements the prediction of the missing value
and can also return lower and upper bounds of an approximated 90\% prediction interval.
See the help page of \code{\link{Predict}} for more information on the prediction interval.
The example section below shows how the prediction interval can be calculated and displayed.  

To tailor the procedure to a specific dataset, it might be necessary to
adapt the subset and/or the prediction strategy.
On the one hand, this can be done by changing the default arguments of \code{\link{Subset}} and
\code{\link{Predict}} through the argument \code{...} of \code{Gapfill}.
See the help of the corresponding functions for more information about their arguments.
On the other hand, the user can define a new subset and predict functions, and pass them to \code{Gapfill}
through the arguments \code{fnSubset} and \code{fnPredict}.
See \link{Extend} for more information. 

The current implementation of \code{\link{Subset}} does not take into account
that values at the boundaries of \code{data} can be neighboring to each other.
For example, if global data (entire sphere) are considered,
\code{data[1,1,,]} is a neighbor of \code{data[dim(data)[1], dim(data)[2],,]}.
Similar considerations apply when data are available for an entire year. 
To take this into account, the \code{Subset} function can be redefined accordingly or
the data can be augmented.

There are two strategies to run the gap-filling in parallel.
The first one is to set the argument \code{dopar} of \code{Gapfill} to \code{TRUE} and
to use an openMP or MPI parallel back-end.
The parallel back-end needs to be setup before the call to \code{Gapfill}.
An example using the R package \code{doParallel} is given below.
Note that there exist other parallel back-ends implemented in other packages; such as, e.g., the package \code{doMpi}.
Some parallel back-ends are platform dependent. 
While this approach shortens the process time by distributing the computational workload,
it does not reduce the memory footprint of the procedure.
The second strategy, which  also reduces memory usage, is to split the \code{data} into several independent chunks.
Whether data chunks are independent or not depends on the function provided to \code{fnSubset}. 
For example, the default \code{\link{Subset}} function never includes data that
is further apart from the missing value than 1 seasonal index.
Hence, \code{data[,,1:3,]} can be used to gap-fill \code{data[,,2,]}.\cr
}
\examples{
\dontrun{
out <- Gapfill(ndvi, clipRange = c(0, 1))

## look at input and output
str(ndvi)
str(out)
Image(ndvi)
Image(out$fill)

## run on 2 cores in parallel
if(require(doParallel)){
  registerDoParallel(2)
  out <- Gapfill(ndvi, dopar = TRUE)
}

## return also the prediction interval
out <- Gapfill(ndvi, nPredict = 3, predictionInterval = TRUE)

## dimension has changed according to 'nPredict = 3'
dim(out$fill)

## clip values outside the valid parameter space [0,1].
out$fill[out$fill < 0] <- 0
out$fill[out$fill > 1] <- 1

## images of the output:
## predicted NDVI
Image(out$fill[,,,,1])
## lower bound of the prediction interval
Image(out$fill[,,,,2])
## upper bound of the prediction interval
Image(out$fill[,,,,3])
## prediction interval length
Image(out$fill[,,,,3] - out$fill[,,,,2])

}
}
\author{
Florian Gerber, \email{florian.gerber@math.uzh.ch}.
}
\references{
F. Gerber, R. Furrer, G. Schaepman-Strub, R. de Jong, M. E. Schaepman, 2016,
Predicting missing values in spatio-temporal satellite data.
\url{http://arxiv.org/abs/1605.01038}.
}
\seealso{
\code{\link{Extend}}, \code{\link{Subset-Predict}}, \code{\link{Image}}.
}

