% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/AminoAcids.R
\name{aminoAcidProperties}
\alias{aminoAcidProperties}
\title{Calculates amino acid chemical properties for sequence data}
\usage{
aminoAcidProperties(
  data,
  property = c("length", "gravy", "bulk", "aliphatic", "polarity", "charge", "basic",
    "acidic", "aromatic"),
  seq = "junction",
  nt = TRUE,
  trim = FALSE,
  label = NULL,
  ...
)
}
\arguments{
\item{data}{\code{data.frame} containing sequence data.}

\item{property}{vector strings specifying the properties to be calculated. Defaults
to calculating all defined properties.}

\item{seq}{\code{character} name of the column containing input 
sequences.}

\item{nt}{boolean, TRUE if the sequences (or sequence) are DNA and will be translated.}

\item{trim}{if \code{TRUE} remove the first and last codon/amino acids from each
sequence before calculating properties. If \code{FALSE} do
not modify input sequences.}

\item{label}{name of sequence region to add as prefix to output column names.}

\item{...}{additional named arguments to pass to the functions 
\link{gravy}, \link{bulk}, \link{aliphatic}, \link{polar} or \link{charge}.}
}
\value{
A modified \code{data} data.frame with the following columns:
         \itemize{
           \item  \code{*_aa_length}:     number of amino acids.
           \item  \code{*_aa_gravy}:      grand average of hydrophobicity (gravy) index.
           \item  \code{*_aa_bulk}:       average bulkiness of amino acids.
           \item  \code{*_aa_aliphatic}:  aliphatic index.
           \item  \code{*_aa_polarity}:   average polarity of amino acids.
           \item  \code{*_aa_charge}:     net charge.
           \item  \code{*_aa_basic}:      fraction of informative positions that are 
                                          Arg, His or Lys.
           \item  \code{*_aa_acidic}:     fraction of informative positions that are 
                                          Asp or Glu.
           \item  \code{*_aa_aromatic}:   fraction of informative positions that are 
                                          His, Phe, Trp or Tyr.
           
         }
         
         Where \code{*} is the value from \code{label} or the name specified for 
         \code{seq} if \code{label=NULL}.
}
\description{
\code{aminoAcidProperties} calculates amino acid sequence physicochemical properties, including
length, hydrophobicity, bulkiness, polarity, aliphatic index, net charge, acidic residue
content, basic residue content, and aromatic residue content.
}
\details{
For all properties except for length, non-informative positions are excluded, 
where non-informative is defined as any character in \code{c("X", "-", ".", "*")}.

The scores for gravy, bulkiness and polarity are calculated as simple averages of the 
scores for each informative positions. The basic, acid and aromatic indices are 
calculated as the fraction of informative positions falling into the given category.

The aliphatic index is calculated using the Ikai, 1980 method.

The net charge is calculated using the method of Moore, 1985, excluding the N-terminus and
C-terminus charges, and normalizing by the number of informative positions.  The default 
pH for the calculation is 7.4.

The following data sources were used for the default property scores:
\itemize{
  \item  hydropathy:  Kyte & Doolittle, 1982.  
  \item  bulkiness:   Zimmerman et al, 1968. 
  \item  polarity:    Grantham, 1974.
  \item  pK:          EMBOSS.
}
}
\examples{
# Subset example data
db <- ExampleDb[c(1,10,100), c("sequence_id", "junction")]

# Calculate default amino acid properties from DNA sequences
aminoAcidProperties(db, seq="junction")
# Calculate default amino acid properties from amino acid sequences
# Use a custom output column prefix
db$junction_aa <- translateDNA(db$junction)
aminoAcidProperties(db, seq="junction_aa", label="junction", nt=FALSE)

# Use the Grantham, 1974 side chain volume scores from the seqinr package
# Set pH=7.0 for the charge calculation
# Calculate only average volume and charge
# Remove the head and tail amino acids from the junction, thus making it the CDR3
library(seqinr)
data(aaindex)
x <- aaindex[["GRAR740103"]]$I
# Rename the score vector to use single-letter codes
names(x) <- translateStrings(names(x), ABBREV_AA)
# Calculate properties
aminoAcidProperties(db, property=c("bulk", "charge"), seq="junction", 
                    trim=TRUE, label="cdr3", bulkiness=x, pH=7.0)

}
\references{
\enumerate{
  \item  Zimmerman JM, Eliezer N, Simha R. The characterization of amino acid sequences 
           in proteins by statistical methods. J Theor Biol 21, 170-201 (1968).
  \item  Grantham R. Amino acid difference formula to help explain protein evolution. 
           Science 185, 862-864 (1974).
  \item  Ikai AJ. Thermostability and aliphatic index of globular proteins. 
           J Biochem 88, 1895-1898 (1980).
  \item  Kyte J, Doolittle RF. A simple method for displaying the hydropathic character 
           of a protein. J Mol Biol 157, 105-32 (1982).
  \item  Moore DS. Amino acid and peptide net charges: A simple calculational procedure. 
           Biochem Educ 13, 10-11 (1985).
  \item  Wu YC, et al. High-throughput immunoglobulin repertoire analysis distinguishes 
           between human IgM memory and switched memory B-cell populations. 
           Blood 116, 1070-8 (2010).
  \item  Wu YC, et al. The relationship between CD27 negative and positive B cell 
           populations in human peripheral blood. 
           Front Immunol 2, 1-12 (2011).
  \item  \url{https://emboss.sourceforge.net/apps/cvs/emboss/apps/iep.html}
}
}
\seealso{
See \link{countPatterns} for counting the occurrence of specific amino acid subsequences.
See \link{gravy}, \link{bulk}, \link{aliphatic}, \link{polar} and \link{charge} for functions 
that calculate the included properties individually.
}
