% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ImputeEOF.R
\name{ImputeEOF}
\alias{ImputeEOF}
\title{Impute missing values}
\usage{
ImputeEOF(
  formula,
  max.eof = NULL,
  data = NULL,
  min.eof = 1,
  tol = 0.01,
  max.iter = 10000,
  validation = NULL,
  verbose = interactive()
)
}
\arguments{
\item{formula}{a formula to build the matrix that will be used in the SVD
decomposition (see Details)}

\item{max.eof, min.eof}{maximum and minimum number of singular values used for
imputation}

\item{data}{a data.frame}

\item{tol}{tolerance used for determining convergence}

\item{max.iter}{maximum iterations allowed for the algorithm}

\item{validation}{number of points to use in cross-validation (defaults to the
maximum of 30 or 10\% of the non NA points)}

\item{verbose}{logical indicating whether to print progress}
}
\value{
A vector of imputed values with attributes \code{eof}, which is the number of
singular values used in the final imputation; and \code{rmse}, which is the Root
Mean Square Error estimated from cross-validation.
}
\description{
Imputes missing values via Data Interpolating Empirical Orthogonal Functions
(DINEOF).
}
\details{
Singular values can be computed over matrices so \code{formula} denotes how
to build a matrix from the data. It is a formula of the form VAR ~ LEFT | RIGHT
(see \link[Formula:Formula]{Formula::Formula}) in which VAR is the variable whose values will
populate the matrix, and LEFT represent the variables used to make the rows
and RIGHT, the columns of the matrix.
Think it like "VAR \emph{as a function} of LEFT \emph{and} RIGHT".

Alternatively, if \code{value.var} is not \code{NULL}, it's possible to use the
(probably) more familiar \link[data.table:dcast.data.table]{data.table::dcast} formula interface. In that case,
\code{data} must be provided.

If \code{data} is a matrix, the \code{formula} argument is ignored and the function
returns a matrix.
}
\examples{
library(data.table)
data(geopotential)
geopotential <- copy(geopotential)
geopotential[, gh.t := Anomaly(gh), by = .(lat, lon, month(date))]

# Add gaps to field
geopotential[, gh.gap := gh.t]
set.seed(42)
geopotential[sample(1:.N, .N*0.3), gh.gap := NA]

max.eof <- 5    # change to a higher value
geopotential[, gh.impute := ImputeEOF(gh.gap ~ lat + lon | date, max.eof,
                                      verbose = TRUE, max.iter = 2000)]

library(ggplot2)
ggplot(geopotential[date == date[1]], aes(lon, lat)) +
    geom_contour(aes(z = gh.t), color = "black") +
    geom_contour(aes(z = gh.impute))

# Scatterplot with a sample.
na.sample <- geopotential[is.na(gh.gap)][sample(1:.N, .N*0.1)]
ggplot(na.sample, aes(gh.t, gh.impute)) +
    geom_point()

# Estimated RMSE
attr(geopotential$gh.impute, "rmse")
# Real RMSE
geopotential[is.na(gh.gap), sqrt(mean((gh.t - gh.impute)^2))]


}
\references{
Beckers, J.-M., Barth, A., and Alvera-Azcárate, A.: DINEOF reconstruction of clouded images including error maps – application to the Sea-Surface Temperature around Corsican Island, Ocean Sci., 2, 183-199, \doi{10.5194/os-2-183-2006}, 2006.
}
