% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DataPackage.R
\name{describeWorkflow}
\alias{describeWorkflow}
\alias{describeWorkflow,DataPackage-method}
\title{Add data derivation information to a DataPackage}
\usage{
describeWorkflow(x, ...)

\S4method{describeWorkflow}{DataPackage}(
  x,
  sources = list(),
  program = NA_character_,
  derivations = list(),
  insertDerivations = TRUE,
  ...
)
}
\arguments{
\item{x}{The \code{DataPackage} to add provenance relationships to.}

\item{...}{Additional parameters}

\item{sources}{A list of DataObjects for files that were read by the program. Alternatively, a list 
of DataObject identifiers can be specified as a list of character strings.}

\item{program}{The DataObject created for the program such as an R script. Alternatively the DataObject identifier can
be specified.}

\item{derivations}{A list of DataObjects for files that were generated by the program. Alternatively, a list 
of DataObject identifiers can be specified as a list of character strings.}

\item{insertDerivations}{A \code{logical} value. If TRUE then the provenance relationship 
\code{prov:wasDerivedFrom} will be used to connect every source and derivation. The default value 
is TRUE.}
}
\description{
Add information about the relationships among DataObject members 
in a DataPackage, retrospectively describing the way in which derived data were 
created from source data using a processing program such as an R script.  These provenance
relationships allow the derived data to be understood sufficiently for users
to be able to reproduce the computations that created the derived data, and to
trace lineage of the derived data objects. The method \code{describeWorkflow} 
will add provenance relationships between a script that was executed, the files 
that it used as sources, and the derived files that it generated.
}
\details{
This method operates on a DataPackage that has had DataObjects for 
the script, data sources (inputs), and data derivations (outputs) previously 
added to it, or can reference identifiers for objects that exist in other DataPackage
instances. This allows a user to create a standalone package that contains all of
its source, script, and derived data, or a set of data packages that are chained
together via a set of derivation relationships between the members of those packages.

Provenance relationships are described following the the ProvONE data model, which
can be viewed at \url{https://purl.dataone.org/provone-v1-dev}.  In particular, 
the following relationships are inserted (among others):
\itemize{
 \item{\code{prov:used}} {indicates which source data was used by a program execution}
 \item{\code{prov:generatedBy}} {indicates which derived data was created by a program execution}
 \item{\code{prov:wasDerivedFrom}} {indicates the source data from which derived data were created using the program}
}
}
\examples{
library(datapack)
dp <- new("DataPackage")
# Add the script to the DataPackage
progFile <- system.file("./extdata/pkg-example/logit-regression-example.R", package="datapack")
progObj <- new("DataObject", format="application/R", filename=progFile)
dp <- addMember(dp, progObj)

# Add a script input to the DataPackage
inFile <- system.file("./extdata/pkg-example/binary.csv", package="datapack") 
inObj <- new("DataObject", format="text/csv", filename=inFile)
dp <- addMember(dp, inObj)

# Add a script output to the DataPackage
outFile <- system.file("./extdata/pkg-example/gre-predicted.png", package="datapack")
outObj <- new("DataObject", format="image/png", file=outFile)
dp <- addMember(dp, outObj)

# Add the provenenace relationshps, linking the input and output to the script execution
# Note: 'sources' and 'derivations' can also be lists of "DataObjects" or "DataObject' identifiers
dp <- describeWorkflow(dp, sources = inObj, program = progObj, derivations = outObj) 
# View the results
utils::head(getRelationships(dp))
}
\seealso{
The R 'recordr' package for run-time recording of provenance relationships.
}
