% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/probes.R
\name{cut_probes}
\alias{cut_probes}
\title{Cut probes}
\usage{
cut_probes(
  ref.seq.from.file = FALSE,
  ref.seq.id,
  ref.seq.db,
  fasta.file = NULL,
  delete.fasta = FALSE,
  start = 1,
  stop = NULL,
  start.correction = FALSE,
  size = 24:32,
  delete.incomplete = FALSE,
  delete.identical = FALSE,
  give.probes.id = FALSE,
  mc.cores = 1,
  verbose = TRUE
)
}
\arguments{
\item{ref.seq.from.file}{logical; read reference sequences from file (\code{TRUE}) or download them from NCBI data base (\code{FALSE}).}

\item{ref.seq.id}{identification number of reference nucleotide sequences. Only used when \code{ref.seq.from.file = FALSE}.
GenBank accession numbers, GenInfo identifiers (GI) or Entrez unique identifiers (UID) may be used.}

\item{ref.seq.db}{character; NCBI database for search. See \link[rentrez]{entrez_dbs} for possible values.
Only used when \code{ref.seq.from.file = FALSE}.}

\item{fasta.file}{character; FASTA file name and path, only used when \code{ref.seq.from.file = TRUE}.}

\item{delete.fasta}{logical; delete FASTA file.}

\item{start, stop}{integer; number of first and last nucleotide of the reference sequence's segment that should be cut into probes.
All sequence is used by default.}

\item{start.correction}{logical; count probes' start and stop nucleotides relatively to the specified segment (\code{FALSE})
or to the whole sequence (\code{TRUE}). Only used if \code{start>1}.}

\item{size}{integer; vector of probe size}

\item{delete.incomplete}{logical; remove probes that contain undeciphered nucleotides}

\item{delete.identical}{logical; remove identical (duplicated) probes}

\item{give.probes.id}{logical; add probes' identification numbers}

\item{mc.cores}{integer; number of processors for parallel computation (not supported on Windows)}

\item{verbose}{logical; show messages}
}
\value{
Data frame with probe id (optionally), sequence id, probe size, start and stop nucleotide, sequence.
}
\description{
Generate probes from nucleotide reference sequences
}
\details{
This function takes nucleotide sequences and cut them on segments (probes) of given size.
Sequences might be downloaded from given FASTA file or from NCBI data bases.
In the latter case, FASTA file is created.
If desired, FASTA file can be deleted after.

Not all sequence must be cut on probes, you may define needed segment by \code{start} and \code{stop} parameters.
Note that in this case probes' start and stop nucleotides would be counted relatively to the specified segment (\code{start.correction = FALSE})
or to the whole sequence (\code{start.correction = TRUE}).

Undeciphered nucleotides are the one that are indicated by "rywsmkhbvdn" symbols.

Probes' identification numbers are created by adding numeric indexes to reference sequence's identification number.

See \link{cut_string}, \link{delete_duplicates_DF} and \link{make_ids} for details.
}
\examples{
path <- tempdir()
dir.create (path)
# download and save as FASTA "Chlamydia pneumoniae B21 contig00001,
# whole genome shotgun sequence" (GI = 737435910)
reference.string <- rentrez::entrez_fetch(db = "nucleotide", id = 737435910,
                                         rettype="fasta")
write( x= reference.string, file = paste0 (path, "/fasta"))
probes <- cut_probes (ref.seq.from.file = TRUE, fasta.file = paste0(path, "/fasta"),
                     delete.fasta = TRUE, start = 1000, stop = 1500,
                     start.correction = FALSE, size = c(400, 500),
                     delete.incomplete = FALSE,
                     delete.identical = FALSE, give.probes.id = TRUE, mc.cores = 1)
unlink (path, recursive = TRUE)

}
\author{
Elena N. Filatova
}
