% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get_dataset.R
\name{get_dataset}
\alias{get_dataset}
\title{Download and Read a Dataset from Kaggle}
\usage{
get_dataset(dataset)
}
\arguments{
\item{dataset}{A character string specifying the dataset identifier on Kaggle. It should follow the format "username/dataset-name".}
}
\value{
An unnamed list of dataframes corresponding to the files that were able to be read by \code{get_data()}. If only one file is able to be read, a individual dataframe is returned.
}
\description{
This function retrieves a dataset from Kaggle by downloading its metadata and associated ZIP file and then reads all supported files contained in its archive. Each supported file is loaded into appropriate function (see details for more information about this). The function returns a single data frame if there is only one file detected and an unnamed list of data frames otherwise. This function is only capable of pulling data from Kaggle Datasets and not competitions.
}
\details{
The function constructs the metadata URL based on the provided dataset identifier, then sends a GET request using the \code{httr} package. If the request is successful, the returned JSON metadata is parsed. The function searches the metadata for a file with an encoding format of "application/zip", then downloads that ZIP file using a temporary file (managed by the \code{withr} package). After unzipping the file into a temporary directory, the function locates all files with extensions corresponding to popular dataset formats (\code{csv}, \code{tsv}, \code{xlsx}, \code{json}, \code{rds}, \code{parquet}, \code{ods}, \code{shp}, \code{geojson} and \code{feather}). Each file is then read using the appropriate function:
\itemize{
  \item \code{readr::read_csv} for CSV files.
  \item \code{readr::read_tsv} for TSV files.
  \item \code{readxl::read_excel} for xlsx files.
  \item \code{jsonlite::fromJSON} for JSON files.
  \item \code{readRDS} for RDS files.
  \item \code{arrow::read_parquet} for Parquet files.
  \item \code{readODS::read_ods} for ODS files
  \item \code{sf::read_sf} for SHP and GEOJSON files.
  \item \code{arrow::read_feather} for Feather files.
}
The function stops with an error if any of the following occur:
\itemize{
  \item The HTTP request fails.
  \item No ZIP file URL is found in the metadata.
  \item No supported data files are found in the unzipped contents.
}
}
\examples{
\donttest{
  # Download and read the "canadian-prime-ministers" dataset from Kaggle
  canadian_prime_ministers <- get_dataset("benjaminsmith/canadian-prime-ministers")
  canadian_prime_ministers
  
  # csv 
  canadian_prime_ministers <- get_dataset("benjaminsmith/canadian-prime-ministers")
  # tsv 
  arabic_twitter <- get_dataset("mksaad/arabic-sentiment-twitter-corpus")
  # xlsx
  hr_data <- get_dataset("kmldas/hr-employee-data-descriptive-analytics")
  # json
  iris_json <- get_dataset("rtatman/iris-dataset-json-version")
  # rds
  br_pop_2019<-get_dataset("ianfukushima/br-pop-2019")
  # parquet
  iris_datasets<-get_dataset("gpreda/iris-dataset")
  #ods
  new_houses <- get_dataset("nm8883/new-houses-built-each-year-in-england")
  #shp
  india_states <- get_dataset("dhruvanurag20/final-shp")
  #geojson
  montreal <- get_dataset("rinichristy/montreal-geojson")
  #feather
  ncaa <- get_dataset("corochann/ncaa-march-madness-2020-womens")
}

}
