% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/read_bed.R
\name{read_bed}
\alias{read_bed}
\title{Read a genotype matrix in Plink BED format}
\usage{
read_bed(
  file,
  names_loci = NULL,
  names_ind = NULL,
  m_loci = NA,
  n_ind = NA,
  verbose = TRUE
)
}
\arguments{
\item{file}{Input file path.
*.bed extension may be omitted (will be added automatically if it is missing).}

\item{names_loci}{Vector of loci names, to become the row names of the genotype matrix.
If provided, its length sets \code{m_loci} below.
If \code{NULL}, the returned genotype matrix will not have row names, and \code{m_loci} must be provided.}

\item{names_ind}{Vector of individual names, to become the column names of the genotype matrix.
If provided, its length sets \code{n_ind} below.
If \code{NULL}, the returned genotype matrix will not have column names, and \code{n_ind} must be provided.}

\item{m_loci}{Number of loci in the input genotype table.
Required if \code{names_loci = NULL}, as its value is not deducible from the BED file itself.
Ignored if \code{names_loci} is provided.}

\item{n_ind}{Number of individuals in the input genotype table.
Required if \code{names_ind = NULL}, as its value is not deducible from the BED file itself.
Ignored if \code{names_ind} is provided.}

\item{verbose}{If \code{TRUE} (default) function reports the path of the file being read (after autocompleting the extension).}
}
\value{
The \code{m}-by-\code{n} genotype matrix.
}
\description{
This function reads genotypes encoded in a Plink-formatted BED (binary) file, returning them in a standard R matrix containing genotypes encoded numerically as dosages (values in \code{c( 0, 1, 2, NA )}).
Each genotype per locus (\code{m} loci) and individual (\code{n} total) counts the number of reference alleles, or \code{NA} for missing data.
No *.fam or *.bim files are read by this basic function.
Since BED does not encode the data dimensions internally, these values must be provided by the user.
}
\details{
The code enforces several checks to validate data given the requested dimensions.
Errors are thrown if file terminates too early or does not terminate after genotype matrix is filled.
In addition, as each locus is encoded in an integer number of bytes, and each byte contains up to four individuals, bytes with fewer than four are padded with zeroes (non-zero pads throw errors).

This function only supports locus-major BED files, which are the standard for modern data.
Format is validated via the BED file's magic numbers (first three bytes of file).
Older BED files can be converted using Plink.
}
\examples{
# first obtain data dimensions from BIM and FAM files
# all file paths
file_bed <- system.file("extdata", 'sample.bed', package = "genio", mustWork = TRUE)
file_bim <- system.file("extdata", 'sample.bim', package = "genio", mustWork = TRUE)
file_fam <- system.file("extdata", 'sample.fam', package = "genio", mustWork = TRUE)
# read annotation tables
bim <- read_bim(file_bim)
fam <- read_fam(file_fam)

# read an existing Plink *.bim file
# pass locus and individual IDs as vectors, setting data dimensions too
X <- read_bed(file_bed, bim$id, fam$id)
X

# can specify without extension
file_bed <- sub('\\\\.bed$', '', file_bed) # remove extension from this path on purpose
file_bed # verify .bed is missing
X <- read_bed(file_bed, bim$id, fam$id) # loads too!
X

}
\seealso{
\code{\link[=read_plink]{read_plink()}} for reading a set of BED/BIM/FAM files.

\code{\link[=geno_to_char]{geno_to_char()}} for translating numerical genotypes into more human-readable character encodings.

Plink BED format reference:
\url{https://www.cog-genomics.org/plink/1.9/formats#bed}
}
