% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/nanoparquet-package.R
\docType{package}
\name{nanoparquet-package}
\alias{nanoparquet}
\alias{nanoparquet-package}
\title{nanoparquet: Read and Write 'Parquet' Files}
\description{
Self-sufficient reader and writer for flat 'Parquet' files. Can read most 'Parquet' data types. Can write many 'R' data types, including factors and temporal types. See docs for limitations.
}
\details{
\code{nanoparquet} is a reader and writer for a common subset of Parquet files.
\subsection{Features:}{
\itemize{
\item Read and write flat (i.e. non-nested) Parquet files.
\item Can read most \href{https://nanoparquet.r-lib.org/reference/nanoparquet-types.html}{Parquet data types}.
\item Can read a subset of columns from a Parquet file.
\item Can write many R data types, including factors and temporal types
to Parquet.
\item Can append a data frame to a Parquet file without first reading and then
rewriting the whole file.
\item Completely dependency free.
\item Supports Snappy, Gzip and Zstd compression.
\item \href{https://nanoparquet.r-lib.org/dev/articles/benchmarks.html}{Competitive} with other
tools in terms of speed, memory use and file size.
}
}

\subsection{Limitations:}{
\itemize{
\item Nested Parquet types are not supported.
\item Some Parquet logical types are not supported: \code{INTERVAL},
\code{UNKNOWN}.
\item Only Snappy, Gzip and Zstd compression is supported.
\item Encryption is not supported.
\item Reading files from URLs is not supported.
\item nanoparquet always reads the data (or the selected subset of it) into
memory. It does not work with out-of-memory data in Parquet files like
Apache Arrow and DuckDB does.
}
}

\subsection{Installation}{

Install the R package from CRAN:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{install.packages("nanoparquet")
}\if{html}{\out{</div>}}
}

\subsection{Usage}{
\subsection{Read}{

Call \code{read_parquet()} to read a Parquet file:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{df <- nanoparquet::read_parquet("example.parquet")
}\if{html}{\out{</div>}}

To see the columns of a Parquet file and how their types are mapped to
R types by \code{read_parquet()}, call \code{read_parquet_schema()} first:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{nanoparquet::read_parquet_schema("example.parquet")
}\if{html}{\out{</div>}}

Folders of similar-structured Parquet files (e.g. produced by Spark)
can be read like this:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{df <- data.table::rbindlist(lapply(
  Sys.glob("some-folder/part-*.parquet"),
  nanoparquet::read_parquet
))
}\if{html}{\out{</div>}}
}

\subsection{Write}{

Call \code{write_parquet()} to write a data frame to a Parquet file:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{nanoparquet::write_parquet(mtcars, "mtcars.parquet")
}\if{html}{\out{</div>}}

To see how the columns of the data frame will be mapped to Parquet types
by \code{write_parquet()}, call \code{infer_parquet_schema()} first:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{nanoparquet::infer_parquet_schema(mtcars)
}\if{html}{\out{</div>}}
}

\subsection{Inspect}{

Call \code{read_parquet_info()}, \code{read_parquet_schema()}, or
\code{read_parquet_metadata()} to see various kinds of metadata from a Parquet
file:
\itemize{
\item \code{read_parquet_info()} shows a basic summary of the file.
\item \code{read_parquet_schema()} shows all columns, including non-leaf columns,
and how they are mapped to R types by \code{read_parquet()}.
\item \code{read_parquet_metadata()} shows the most complete metadata information:
file meta data, the schema, the row groups and column chunks of the
file.
}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{nanoparquet::read_parquet_info("mtcars.parquet")
nanoparquet::read_parquet_schema("mtcars.parquet")
nanoparquet::read_parquet_metadata("mtcars.parquet")
}\if{html}{\out{</div>}}

If you find a file that should be supported but isn't, please open an
issue here with a link to the file.
}

}

\subsection{Options}{

See also \code{?parquet_options()} for further details.
\itemize{
\item \code{nanoparquet.class}: extra class to add to data frames returned by
\code{read_parquet()}. If it is not defined, the default is \code{"tbl"},
which changes how the data frame is printed if the pillar package is
loaded.
\item \code{nanoparquet.compression_level}: See \code{?parquet_options()} for the
defaults and the possible values for each compression method. \code{Inf}
selects maximum compression for each method.
\item \code{nanoparquet.num_rows_per_row_group}: The number of rows to put into a
row group by \code{write_parquet()}, if row groups are not specified
explicitly. It should be an integer scalar. Defaults to 10 million.
\item \code{nanoparquet.use_arrow_metadata}: unless this is set to \code{FALSE},
\code{read_parquet()} will make use of Arrow metadata in the Parquet file.
Currently this is used to detect factor columns.
\item \code{nanoparquet.write_arrow_metadata}: unless this is set to \code{FALSE},
\code{write_parquet()} will add Arrow metadata to the Parquet file.
This helps preserving classes of columns, e.g. factors will be read
back as factors, both by nanoparquet and Arrow.
\item \code{nanoparquet.write_data_page_version}: Data version to write by default.
Possible values are 1 and 2. Default is 1.
\item \code{nanoparquet.write_minmax_values}: Whether to write minimum and maximum
values per row group, for data types that support this in
\code{write_parquet()}.
}
}

\subsection{License}{

MIT
}
}
\seealso{
Useful links:
\itemize{
  \item \url{https://github.com/r-lib/nanoparquet}
  \item \url{https://nanoparquet.r-lib.org/}
  \item Report bugs at \url{https://github.com/r-lib/nanoparquet/issues}
}

}
\author{
\strong{Maintainer}: Gábor Csárdi \email{csardi.gabor@gmail.com}

Authors:
\itemize{
  \item Hannes Mühleisen (\href{https://orcid.org/0000-0001-8552-0029}{ORCID}) [copyright holder]
}

Other contributors:
\itemize{
  \item Google Inc. [copyright holder]
  \item Apache Software Foundation [copyright holder]
  \item Posit Software, PBC [copyright holder]
  \item RAD Game Tools [copyright holder]
  \item Valve Software [copyright holder]
  \item Tenacious Software LLC [copyright holder]
  \item Facebook, Inc. [copyright holder]
}

}
