Package {duckdb}


Title: DBI Package for the DuckDB Database Management System
Version: 1.5.4.2
Description: The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.
License: MIT + file LICENSE
URL: https://r.duckdb.org/, https://github.com/duckdb/duckdb-r
BugReports: https://github.com/duckdb/duckdb-r/issues
Depends: DBI, R (≥ 4.2.0)
Imports: methods, utils
Suggests: adbcdrivermanager, arrow (≥ 13.0.0), bit64, callr, clock, DBItest, dbplyr, dplyr, nanoarrow, rlang, sf, testthat (≥ 3.0.0), tibble, vctrs, wk, withr
Biarch: true
Config/build/compilation-database: false
Config/build/never-clean: true
Config/comment/compilation-database: Generate manually with pkgload:::generate_db() for faster pkgload::load_all()
Config/testthat/edition: 3
Encoding: UTF-8
SystemRequirements: xz (for building from source)
Config/roxygen2/version: 8.0.0.9000
NeedsCompilation: yes
Packaged: 2026-06-29 02:54:54 UTC; kirill
Author: Hannes Mühleisen ORCID iD [aut], Mark Raasveldt ORCID iD [aut], Kirill Müller ORCID iD [cre], Stichting DuckDB Foundation [cph], Apache Software Foundation [cph], PostgreSQL Global Development Group [cph], The Regents of the University of California [cph], Cameron Desrochers [cph], Victor Zverovich [cph], RAD Game Tools [cph], Valve Software [cph], Rich Geldreich [cph], Tenacious Software LLC [cph], The RE2 Authors [cph], Google Inc. [cph], Facebook Inc. [cph], Steven G. Johnson [cph], Jiahao Chen [cph], Tony Kelman [cph], Jonas Fonseca [cph], Lukas Fittl [cph], Salvatore Sanfilippo [cph], Art.sy, Inc. [cph], Oran Agra [cph], Redis Labs, Inc. [cph], Melissa O'Neill [cph], PCG Project contributors [cph]
Maintainer: Kirill Müller <kirill@cynkra.com>
Repository: CRAN
Date/Publication: 2026-06-29 09:40:02 UTC

duckdb: DBI Package for the DuckDB Database Management System

Description

logo

The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.

Author(s)

Maintainer: Kirill Müller kirill@cynkra.com (ORCID)

Authors:

Other contributors:

See Also

Useful links:


DuckDB SQL backend for dbplyr

Description

This is a SQL backend for dbplyr tailored to take into account DuckDB's possibilities. This mainly follows the backend for PostgreSQL, but contains more mapped functions.

tbl_file() is an experimental variant of dplyr::tbl() to directly access files on disk. It is safer than dplyr::tbl() because there is no risk of misinterpreting the request, and paths with special characters are supported.

tbl_function() is an experimental variant of dplyr::tbl() to create a lazy table from a table-generating function, useful for reading nonstandard CSV files or other data sources. It is safer than dplyr::tbl() because there is no risk of misinterpreting the query. See https://duckdb.org/docs/data/overview for details on data importing functions.

As an alternative, use dplyr::tbl(src, dplyr::sql("SELECT ... FROM ...")) for custom SQL queries.

tbl_query() is deprecated in favor of tbl_function().

Use simulate_duckdb() with lazy_frame() to see simulated SQL without opening a DuckDB connection.

Usage

tbl_file(src = NULL, path, ..., cache = FALSE)

tbl_function(src, query, ..., cache = FALSE)

tbl_query(src, query, ...)

simulate_duckdb(...)

Arguments

src

A duckdb connection object, default_conn() if omitted.

path

Path to existing Parquet, CSV or JSON file

...

Any parameters to be forwarded

cache

Enable object cache for Parquet files

query

SQL code, omitting the FROM clause

Examples


library(dplyr, warn.conflicts = FALSE)
con <- DBI::dbConnect(duckdb(), path = ":memory:")

db <- copy_to(con, data.frame(a = 1:3, b = letters[2:4]))

db %>%
  filter(a > 1) %>%
  select(b)

path <- tempfile(fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[2:4]))

db_csv <- tbl_file(con, path)
db_csv %>%
  summarize(sum_a = sum(a))

db_csv_fun <- tbl_function(con, paste0("read_csv_auto('", path, "')"))
db_csv %>%
  count()

DBI::dbDisconnect(con, shutdown = TRUE)


Get the default connection

Description

[Experimental]

default_conn() returns a default, built-in connection.

Usage

default_conn()

Details

Currently, the connection is established with duckdb(environment_scan = TRUE) and dbConnect(timezone_out = "", array = "matrix") so that data frames are automatically available as tables, timestamps are returned in the local timezone, and DuckDB's array type is returned as an R matrix. The details of how the connection is established are subject to change. In particular, returning the output as a tibble or other object may be supported in the future.

This connection is intended for interactive use. There is no way for this or other packages to comprehensively track the state of this connection, so scripts and packages should manage their own connections.

Value

A DuckDB connection object

Examples


conn <- default_conn()
sql_query("SELECT 42", conn = conn)


Connect to a DuckDB database instance

Description

duckdb() creates or reuses a database instance.

duckdb_shutdown() shuts down a database instance.

Return an adbcdrivermanager::adbc_driver() for use with Arrow Database Connectivity via the adbcdrivermanager package.

dbConnect() connects to a database instance.

dbDisconnect() closes a DuckDB database connection. The associated DuckDB database instance is shut down automatically, it is no longer necessary to set shutdown = TRUE or to call duckdb_shutdown().

Usage

duckdb(
  dbdir = DBDIR_MEMORY,
  read_only = FALSE,
  bigint = "numeric",
  config = list(),
  ...,
  environment_scan = FALSE
)

duckdb_shutdown(drv)

duckdb_adbc()

## S4 method for signature 'duckdb_driver'
dbConnect(
  drv,
  dbdir = DBDIR_MEMORY,
  ...,
  debug = getOption("duckdb.debug", FALSE),
  read_only = FALSE,
  timezone_out = "UTC",
  tz_out_convert = c("with", "force"),
  config = list(),
  bigint = "numeric",
  array = "none",
  geometry = "blob",
  map = "data.frame"
)

## S4 method for signature 'duckdb_connection'
dbDisconnect(conn, ..., shutdown = TRUE)

Arguments

dbdir

Location for database files. Should be a path to an existing directory in the file system. With the default (or ""), all data is kept in RAM.

read_only

Set to TRUE for read-only operation. For file-based databases, this is only applied when the database file is opened for the first time. Subsequent connections (via the same drv object or a drv object pointing to the same path) will silently ignore this flag.

bigint

How 64-bit integers should be returned. There are two options: "numeric" and "integer64". If "numeric" is selected, bigint integers will be treated as double/numeric. If "integer64" is selected, bigint integers will be set to bit64 encoding.

config

Named list with DuckDB configuration flags, see https://duckdb.org/docs/configuration/overview#configuration-reference for the possible options. These flags are only applied when the database object is instantiated. Subsequent connections will silently ignore these flags.

...

These dots are for future extensions and must be empty.

environment_scan

Set to TRUE to treat data frames from the calling environment as tables. If a database table with the same name exists, it takes precedence. The default of this setting may change in a future version.

drv

Object returned by duckdb()

debug

Print additional debug information, such as queries.

timezone_out

The time zone returned to R, defaults to "UTC", which is currently the only timezone supported by duckdb. If you want to display datetime values in the local timezone, set to Sys.timezone() or "".

tz_out_convert

How to convert timestamp columns to the timezone specified in timezone_out. There are two options: "with", and "force". If "with" is chosen, the timestamp will be returned as it would appear in the specified time zone. If "force" is chosen, the timestamp will have the same clock time as the timestamp in the database, but with the new time zone.

array

How arrays should be returned. There are two options: "none" and "matrix". If "none" is selected, arrays are not returned. Instead an error is generated. If "matrix" is selected, arrays are returned as a column matrix. Each array is one row in the matrix.

geometry

How geometry columns should be returned. There are two options: "blob" and "wk". If "blob" is selected, geometry columns are returned as a list of raw vectors containing WKB data. If "wk" is selected, geometry columns are returned as wk wk_wkb vectors. Use wk::wk_handle() or sf::st_as_sfc() to convert to other geometry formats.

map

How MAP columns should be returned. There are two options: "data.frame" and "list_of". If "data.frame" is selected (the default), MAP columns are returned as a list of data frames with key and value columns. If "list_of" is selected, MAP columns are returned as a vctrs::list_of() whose ptype is a ⁠data.frame(key = <K>, value = <V>)⁠ that records the SQL key/value types. This enables MAP columns to round-trip through dbWriteTable() / dbCreateTable() without specifying field.types, and lets scans accept named-list cells as MAP entries.

conn

A duckdb_connection object

shutdown

Unused. The database instance is shut down automatically.

Details

The behavior of with = "force" at DST transitions depends on how R handles translation from the underlying time representation to a human-readable format. If the timestamp is invalid in the target timezone, the resulting value may be NA or an adjusted time.

Value

duckdb() returns an object of class duckdb_driver.

dbDisconnect() and duckdb_shutdown() are called for their side effect.

An object of class "adbc_driver"

dbConnect() returns an object of class duckdb_connection.

Examples


library(adbcdrivermanager)
with_adbc(db <- adbc_database_init(duckdb_adbc()), {
  as.data.frame(read_adbc(db, "SELECT 1 as one;"))
})


drv <- duckdb()
con <- dbConnect(drv)

dbGetQuery(con, "SELECT 'Hello, world!'")

dbDisconnect(con)
duckdb_shutdown(drv)

# Shorter:
con <- dbConnect(duckdb())
dbGetQuery(con, "SELECT 'Hello, world!'")
dbDisconnect(con, shutdown = TRUE)


DuckDB connection class

Description

Implements DBIConnection.

Usage

## S4 method for signature 'duckdb_connection'
dbAppendTable(conn, name, value, ..., row.names = NULL)

## S4 method for signature 'duckdb_connection'
dbBegin(conn, ...)

## S4 method for signature 'duckdb_connection'
dbCommit(conn, ...)

## S4 method for signature 'duckdb_connection'
dbDataType(dbObj, obj, ...)

## S4 method for signature 'duckdb_connection,ANY'
dbExistsTable(conn, name, ...)

## S4 method for signature 'duckdb_connection'
dbGetInfo(dbObj, ...)

## S4 method for signature 'duckdb_connection'
dbIsValid(dbObj, ...)

## S4 method for signature 'duckdb_connection,character'
dbListFields(conn, name, ...)

## S4 method for signature 'duckdb_connection'
dbListTables(conn, ...)

## S4 method for signature 'duckdb_connection,ANY'
dbQuoteIdentifier(conn, x, ...)

## S4 method for signature 'duckdb_connection'
dbQuoteLiteral(conn, x, ...)

## S4 method for signature 'duckdb_connection,character'
dbRemoveTable(conn, name, ..., fail_if_missing = TRUE)

## S4 method for signature 'duckdb_connection'
dbRollback(conn, ...)

## S4 method for signature 'duckdb_connection,character'
dbSendQueryArrow(conn, statement, params = NULL, ...)

## S4 method for signature 'duckdb_connection,character'
dbSendQuery(conn, statement, params = NULL, ..., arrow = FALSE)

## S4 method for signature 'duckdb_connection,character,data.frame'
dbWriteTable(
  conn,
  name,
  value,
  ...,
  row.names = FALSE,
  overwrite = FALSE,
  append = FALSE,
  field.types = NULL,
  temporary = FALSE
)

## S4 method for signature 'duckdb_connection'
show(object)

Arguments

conn

A duckdb_connection object as returned by DBI::dbConnect()

name

The table name, passed on to dbQuoteIdentifier(). Options are:

  • a character string with the unquoted DBMS table name, e.g. "table_name",

  • a call to Id() with components to the fully qualified table name, e.g. Id(schema = "my_schema", table = "table_name")

  • a call to SQL() with the quoted and fully qualified table name given verbatim, e.g. SQL('"my_schema"."table_name"')

value

A data.frame (or coercible to data.frame).

...

Other parameters passed on to methods.

row.names

Whether the row.names of the data.frame should be preserved

dbObj

An object inheriting from class duckdb_connection.

obj

An R object whose SQL type we want to determine.

statement

a character string containing SQL.

params

For dbBind(), a list of values, named or unnamed, or a data frame, with one element/column per query parameter. For dbBindArrow(), values as a nanoarrow stream, with one column per query parameter.

arrow

Whether the query should be returned as an Arrow Table

overwrite

If a table with the given name already exists, should it be overwritten?

append

If a table with the given name already exists, just try to append the passed data to it

field.types

Override the auto-generated SQL types

temporary

Should the created table be temporary?

object

Any R object


DuckDB driver class

Description

Implements DBIDriver.

Usage

## S4 method for signature 'duckdb_driver'
dbDataType(dbObj, obj, ...)

## S4 method for signature 'duckdb_driver'
dbGetInfo(dbObj, ...)

## S4 method for signature 'duckdb_driver'
dbIsValid(dbObj, ...)

## S4 method for signature 'duckdb_driver'
show(object)

Arguments

dbObj

An object inheriting from class duckdb_driver.

...

Other arguments to methods.

object

Any R object


DuckDB EXPLAIN query tree

Description

DuckDB EXPLAIN query tree


Reads a CSV file into DuckDB

Description

Directly reads a CSV file into DuckDB, tries to detect and create the correct schema for it. This usually is much faster than reading the data into R and writing it to DuckDB.

Usage

duckdb_read_csv(
  conn,
  name,
  files,
  ...,
  header = TRUE,
  na.strings = "",
  nrow.check = 500,
  delim = ",",
  quote = "\"",
  col.names = NULL,
  col.types = NULL,
  lower.case.names = FALSE,
  sep = delim,
  transaction = TRUE,
  temporary = FALSE
)

Arguments

conn

A DuckDB connection, created by dbConnect().

name

The name for the virtual table that is registered or unregistered

files

One or more CSV file names, should all have the same structure though

...

These dots are for future extensions and must be empty.

header

Whether or not the CSV files have a separate header in the first line

na.strings

Which strings in the CSV files should be considered to be NULL

nrow.check

How many rows should be read from the CSV file to figure out data types

delim

Which field separator should be used

quote

Which quote character is used for columns in the CSV file

col.names

Override the detected or generated column names

col.types

Character vector of column types in the same order as col.names, or a named character vector where names are column names and types pairs. Valid types are DuckDB data types, e.g. VARCHAR, DOUBLE, DATE, BIGINT, BOOLEAN, etc.

lower.case.names

Transform column names to lower case

sep

Alias for delim for compatibility

transaction

Should a transaction be used for the entire operation

temporary

Set to TRUE to create a temporary table

Details

If the table already exists in the database, the csv is appended to it. Otherwise the table is created.

Value

The number of rows in the resulted table, invisibly.

Examples


con <- dbConnect(duckdb())

data <- data.frame(a = 1:3, b = letters[1:3])
path <- tempfile(fileext = ".csv")

write.csv(data, path, row.names = FALSE)

duckdb_read_csv(con, "data", path)
dbReadTable(con, "data")

dbDisconnect(con)


# Providing data types for columns
path <- tempfile(fileext = ".csv")
write.csv(iris, path, row.names = FALSE)

con <- dbConnect(duckdb())
duckdb_read_csv(con, "iris", path,
  col.types = c(
    Sepal.Length = "DOUBLE",
    Sepal.Width = "DOUBLE",
    Petal.Length = "DOUBLE",
    Petal.Width = "DOUBLE",
    Species = "VARCHAR"
  )
)
dbReadTable(con, "iris")
dbDisconnect(con)


Register a data frame as a virtual table

Description

duckdb_register() registers a data frame as a virtual table (view) in a DuckDB connection. No data is copied.

Usage

duckdb_register(conn, name, df, overwrite = FALSE, experimental = FALSE)

duckdb_unregister(conn, name)

Arguments

conn

A DuckDB connection, created by dbConnect().

name

The name for the virtual table that is registered or unregistered

df

A data.frame with the data for the virtual table

overwrite

Should an existing registration be overwritten?

experimental

Enable experimental optimizations

Details

duckdb_unregister() unregisters a previously registered data frame.

Value

These functions are called for their side effect.

Examples


con <- dbConnect(duckdb())

data <- data.frame(a = 1:3, b = letters[1:3])

duckdb_register(con, "data", data)
dbReadTable(con, "data")

duckdb_unregister(con, "data")

dbDisconnect(con)


Register an Arrow data source as a virtual table

Description

duckdb_register_arrow() registers an Arrow data source as a virtual table (view) in a DuckDB connection. No data is copied.

Usage

duckdb_register_arrow(conn, name, arrow_scannable, use_async = NULL)

duckdb_unregister_arrow(conn, name)

duckdb_list_arrow(conn)

Arguments

conn

A DuckDB connection, created by dbConnect().

name

The name for the virtual table that is registered or unregistered

arrow_scannable

A scannable Arrow-object

use_async

Switched to the asynchronous scanner. (deprecated)

Details

duckdb_unregister_arrow() unregisters a previously registered data frame.

Value

These functions are called for their side effect.


DuckDB Result Set

Description

Methods for accessing result sets for queries on DuckDB connections. Implements DBIResult.

Usage

duckdb_fetch_arrow(res, chunk_size = 1e+06)

duckdb_fetch_record_batch(res, chunk_size = 1e+06)

## S4 method for signature 'duckdb_result'
dbBind(res, params, ...)

## S4 method for signature 'duckdb_result'
dbClearResult(res, ...)

## S4 method for signature 'duckdb_result'
dbColumnInfo(res, ...)

## S4 method for signature 'duckdb_result'
dbFetch(res, n = -1, ...)

## S4 method for signature 'duckdb_result'
dbGetInfo(dbObj, ...)

## S4 method for signature 'duckdb_result'
dbGetRowCount(res, ...)

## S4 method for signature 'duckdb_result'
dbGetRowsAffected(res, ...)

## S4 method for signature 'duckdb_result'
dbGetStatement(res, ...)

## S4 method for signature 'duckdb_result'
dbHasCompleted(res, ...)

## S4 method for signature 'duckdb_result'
dbIsValid(dbObj, ...)

## S4 method for signature 'duckdb_result'
show(object)

Arguments

res

Query result to be converted to a Record Batch Reader

chunk_size

The chunk size

params

For dbBind(), a list of values, named or unnamed, or a data frame, with one element/column per query parameter. For dbBindArrow(), values as a nanoarrow stream, with one column per query parameter.

...

Other arguments passed on to methods.

n

maximum number of records to retrieve per fetch. Use n = -1 or n = Inf to retrieve all pending records. Some implementations may recognize other special values.

dbObj

An object inheriting from class duckdb_result.

object

Any R object


DuckDB Arrow Result Set

Description

Streaming Arrow result for queries on DuckDB connections. Implements DBIResultArrow-class.

Usage

## S4 method for signature 'duckdb_result_arrow'
dbBind(res, params, ...)

## S4 method for signature 'duckdb_result_arrow'
dbBindArrow(res, params, ...)

## S4 method for signature 'duckdb_result_arrow'
dbClearResult(res, ...)

## S4 method for signature 'duckdb_result_arrow'
dbColumnInfo(res, ...)

## S4 method for signature 'duckdb_result_arrow'
dbFetchArrow(res, ..., chunk_size = 1e+06)

## S4 method for signature 'duckdb_result_arrow'
dbFetchArrowChunk(res, ..., chunk_size = 1e+06)

## S4 method for signature 'duckdb_result_arrow'
dbGetRowCount(res, ...)

## S4 method for signature 'duckdb_result_arrow'
dbGetRowsAffected(res, ...)

## S4 method for signature 'duckdb_result_arrow'
dbGetStatement(res, ...)

## S4 method for signature 'duckdb_result_arrow'
dbHasCompleted(res, ...)

## S4 method for signature 'duckdb_result_arrow'
dbIsValid(dbObj, ...)

## S4 method for signature 'duckdb_result_arrow'
show(object)

Arguments

res

An object inheriting from DBI::DBIResult.

params

For dbBind(), a list of values, named or unnamed, or a data frame, with one element/column per query parameter. For dbBindArrow(), values as a nanoarrow stream, with one column per query parameter.

...

Other arguments passed on to methods.

chunk_size

The chunk size in rows used when pulling Arrow batches from DuckDB.

dbObj

An object inheriting from DBIObject, i.e. DBIDriver, DBIConnection, or a DBIResult

object

Any R object


DuckDB file-system usage: storage locations and how they are resolved

Description

[Experimental]

DuckDB writes several distinct kinds of data to the file system. This page catalogs every such location and documents the unified policy the duckdb R package uses to choose them, so that by default nothing is written outside the R session's temporary directory – except the extension cache, which is auto-probed into the writable package library (and falls back to the temporary directory when the library is read-only, as on CRAN).

The functions that configure these locations are documented in duckdb_storage_config().

Kinds of on-disk state

Home directory

The base DuckDB uses to expand a leading ~ and to derive default sub-locations such as the extension cache. DuckDB setting: home_directory. The package does not set this: doing so would also redirect ~ in user SQL (e.g. ⁠COPY ... TO '~/out.csv'⁠). Each location below is pointed at a temporary directory directly instead.

Extension binaries

Downloaded ⁠*.duckdb_extension⁠ files (e.g. spatial, httpfs, h3). DuckDB setting: extension_directory. A re-usable cache: a given binary is valid only for the exact DuckDB version and platform/ABI that downloaded it. By default the cache is the "library" root (alongside the installed package) when it is writable, falling back to a tempdir() sub-directory when it is not. See the marker section for how this is detected.

Stored secrets

Persisted credentials under stored_secrets. DuckDB setting: secret_directory. Set explicitly to a tempdir() location by default. Configured and migrated with duckdb_storage_config().

Temporary / spill files

Out-of-core intermediates for sorts, hash joins, and similar operations. DuckDB settings: temp_directory, max_temp_directory_size. For an in-memory (⁠:memory:⁠) database DuckDB's own default spills to .tmp in the current working directory, so the package overrides it with a tempdir() sub-directory by default.

Logs and profiling output

Written only when a path is explicitly configured (DuckDB settings log_query_path, http_logging_output, profiling output). They default to off, so there is no location to default and nothing is written without the user asking. If the user turns logging on they choose where it goes.

Database file, WAL, and checkpoints

Chosen by the user through the dbdir argument of duckdb(). The package does not manage these: an on-disk database and its sidecar files live where dbdir points, and an in-memory (⁠:memory:⁠) database has none.

Resolution policy

Each managed location is resolved through the same ordered chain. The first source that yields a value wins:

  1. an explicit value passed to duckdb() via config (e.g. config = list(temp_directory = "..."));

  2. the corresponding R option, e.g. getOption("duckdb.temp_directory");

  3. the corresponding environment variable, e.g. Sys.getenv("DUCKDB_TEMP_DIRECTORY");

  4. a persistent location selected by a marker file (see below);

  5. the default: a per-session sub-directory of tempdir().

The extension cache inserts one extra step before the tempdir() fallback: if no marker has selected a root, the "library" root is probed at connect time by writing its marker – the write doubles as the writability test, and the marker is left in place to record the choice. If the write fails (the library is read-only) the cache falls back to tempdir(). So the effective default is "library when writable, else tempdir", with no persistent write ever attempted where it would fail.

Marker files

Persisting data across sessions means writing outside tempdir(); a marker file records the user's consent to do so, once, so it need not be re-granted on every connection and does not require editing .Rprofile or .Renviron.

Two functions write and relocate these markers – one per kind of state, so the two can be configured independently – and a third reports the current state. They are documented in full on duckdb_storage_config():

duckdb_extension_storage(location, ..., migrate = TRUE, conflict = "error")
duckdb_secret_storage(location, ..., migrate = TRUE, conflict = "error")
duckdb_storage_status()

A ⁠*_storage()⁠ call writes the marker at location (creating, relocating, or – with "session" – removing it); duckdb_storage_status() reports where each kind currently resolves and how it was chosen. There is no ask argument: calling a ⁠*_storage()⁠ function is the consent.

The location argument

location names a root, not a full path. (To point a kind at an arbitrary directory, use the option or environment variable instead – a marker is only ever rediscovered in one of the fixed roots below.) The recognized roots are:

"session"

tempdir() – the default, and the opt-out: setting it removes the marker and reverts that kind to a per-session location.

"user"

tools::R_user_dir() – R-specific, private to this package, surviving package upgrades.

"shared"

⁠~/.duckdb⁠ – shared with the DuckDB CLI and Python client.

"library"

(extensions only) alongside the installed duckdb package (base::system.file()). It pairs binaries with the build's ABI but is wiped on every re-install. This is the automatic default for extensions when the library is writable (see the resolution policy above): rather than require an explicit opt-in, the package probes it at connect time and uses it unless the write fails. Not offered for stored secrets, which always default to "session".

The marker file

The marker's name and contents make clear it belongs to the R package, so a user inspecting the directory can tell at a glance what created it. This matters most in the "shared" root (⁠~/.duckdb⁠), which is also used by the DuckDB CLI and Python client:

<root>/extensions/.duckdb-r-keep        # opts in the extension cache
<root>/stored_secrets/.duckdb-r-keep    # opts in stored secrets

It is not empty: the package writes a single line of human-readable text describing what the file is and that it is safe to delete. Only the file's presence is significant – the contents are never read back or validated, so editing it has no effect.

Markers are per-kind and live inside each kind's sub-directory, so one root can persist extensions but not stored secrets, or vice versa. For extensions in a persistent root, DuckDB's ⁠v<version>/<platform>/⁠ sub-paths keep a stale binary from being loaded into a newer, ABI-incompatible build.

Migration

migrate = TRUE moves the already-cached files from the current location to the new root. conflict decides what happens when a file of the same name exists at the destination: "error" (the default) aborts and lists the collisions without moving anything; "ours" lets the files being relocated win (overwriting the destination); "theirs" keeps the destination files and drops the colliding sources. Secret migration is folded into duckdb_secret_storage().

Rules

A marker selects the location. It is deliberately distinct from the presence of a cached binary: an extension found under ⁠v<version>/<platform>/⁠ governs only validity (whether a re-download is needed), never the choice of location. This separation prevents a stale leftover binary from silently resurrecting a store root and reintroducing an ABI mismatch.

Per-location reference

Kind DuckDB setting Option / environment variable Default
Home home_directory -- left untouched (not set)
Extensions extension_directory duckdb.extension_directory / DUCKDB_EXTENSION_DIRECTORY library if writable, else tempdir()
Stored secrets secret_directory duckdb.secret_directory / DUCKDB_SECRET_DIRECTORY tempdir() sub-directory (set)
Temp/spill temp_directory duckdb.temp_directory / DUCKDB_TEMP_DIRECTORY tempdir() sub-directory (set)
Logs log_query_path duckdb.log_directory / DUCKDB_LOG_DIRECTORY disabled (off)

"set" means duckdb() sets the value explicitly in the database config. The home directory is left untouched so that ~ in user SQL keeps its usual meaning.

Messages

Startup message

When a connection is established and the resolved extension cache lies inside tempdir(), the package emits an informational message – at most once every eight hours per session, including in unattended (non-interactive) runs. It explains that downloaded extensions will not persist across sessions and how to opt into a permanent location. It is shown only when the package chose the location itself; if you set the extension directory (via config, the option, or the environment variable) the choice is yours and the message is suppressed.

Library-cache notice

The first time the extension cache is initialized in the package library (when its marker is written), the package says so once. The marker then persists, so this is effectively once per installation.

Silencing the startup message

Pointing the extension cache at a permanent location (an option, an environment variable, or config) both keeps the extensions and silences the message. If you are happy with a temporary cache and only want the reminder gone, set the location explicitly so it counts as your choice – the simplest is a config entry on every connection:

con <- dbConnect(duckdb(config = list(
  extension_directory = file.path(tempdir(), "duckdb", "extensions")
)))

or set it once per session with options(duckdb.extension_directory = file.path(tempdir(), "duckdb", "extensions")) (or the DUCKDB_EXTENSION_DIRECTORY environment variable).

See Also

duckdb_storage_config() for the functions that configure these locations, and duckdb() for the config argument.


Configure where DuckDB stores extensions and secrets

Description

[Experimental]

Choose where the duckdb R package keeps downloaded extensions and persisted secrets, by writing a small marker file that records the choice:

These functions move the cache and secret store to a location that survives across sessions; the same locations can also be set without a marker by overriding with options and environment variables. The full policy is documented in duckdb_storage.

Usage

duckdb_extension_storage(
  location = c("session", "user", "shared", "library"),
  ...,
  migrate = TRUE,
  conflict = "error"
)

duckdb_secret_storage(
  location = c("session", "user", "shared"),
  ...,
  migrate = TRUE,
  conflict = "error"
)

duckdb_storage_status()

Arguments

location

The destination root (not a path), one of:

  • "session" – the per-session temporary directory; the default, and the opt-out (removes the marker, reverting to a per-session location).

  • "user"tools::R_user_dir().

  • "shared"⁠~/.duckdb⁠, shared with the DuckDB CLI and Python client.

  • "library"(duckdb_extension_storage() only) alongside the installed package.

To use an arbitrary directory, set the option or environment variable instead (see duckdb_storage).

...

These dots are for future extensions and must be empty.

migrate

If TRUE (the default), move the already-cached files from the current location into the new one. Ignored when location is "session": opting out never moves files into the per-session directory.

conflict

How to resolve a name collision during migration: "error" (the default) aborts and lists the collisions without moving anything; "ours" lets the files being relocated overwrite the destination; "theirs" keeps the destination files and drops the colliding sources.

Details

duckdb_extension_storage() and duckdb_secret_storage() write (or remove) the marker for that one kind of state, so the two can be configured independently. duckdb_storage_status() reports where each kind currently resolves and which tier of the resolution policy chose it. The new location takes effect for connections opened afterwards; existing connections are unaffected.

There is no ask argument: calling a ⁠*_storage()⁠ function is itself the consent to write outside the temporary directory.

Value

The ⁠*_storage()⁠ functions are called for their side effect (writing or removing a marker, and optionally migrating files) and return the resolved directory invisibly. duckdb_storage_status() returns a data frame (class "duckdb_storage_status") with one row per kind of state and columns kind, source, and directory; its print method renders a readable summary when the result is auto-printed.

See Also

duckdb_storage for the storage policy these functions implement.


Deprecated functions

Description

read_csv_duckdb() has been superseded by duckdb_read_csv(). The order of the arguments has changed.

Usage

read_csv_duckdb(conn, files, tablename, ...)

Run an SQL query or statement

Description

[Experimental]

sql_query() runs an arbitrary SQL query using DBI::dbGetQuery() and returns a data.frame with the query results. sql_exec() runs an arbitrary SQL statement using DBI::dbExecute() and returns the number of affected rows.

These functions are intended as an easy way to interactively run DuckDB without having to manage connections. By default, data frame objects are available as views.

Scripts and packages should manage their own connections and prefer the DBI methods for more control.

Usage

sql_query(sql, conn = default_conn())

sql_exec(sql, conn = default_conn())

Arguments

sql

A SQL string

conn

An optional connection, defaults to default_conn()

Value

A data frame with the query result

Examples


# Queries
sql_query("SELECT 42")

# Statements with side effects
sql_exec("CREATE TABLE test (a INTEGER, b VARCHAR)")
sql_exec("INSERT INTO test VALUES (1, 'one'), (2, 'two')")
sql_query("FROM test")

# Data frames available as views
sql_query("FROM mtcars")