Introductions

In this package, we provide e-value for four DMR (differentially methylated region) detection tools (MethylKit, Metilene, BiSeq and DMRfinder) and general purpose.

For DMR (methylKit, biseq, DMRfinder or metilene), the met-evalue calculation is conducted by the metevalue.[DMR] function.

DMR Method methyrate Example Method.output Example
MethylKit metevalue.methylKit data(demo_methylkit_methyrate) data(demo_methylkit_met_all)
BiSeq metevalue.biseq data(demo_biseq_methyrate) data(demo_biseq_DMR)
DMRfinder metevalue.DMRfinder data(demo_DMRfinder_rate_combine) data(demo_DMRfinder_DMRs)
Metilene metevalue.metilene data(demo_metilene_input) data(demo_metilene_out)

Two routines are supported to calculate the combined e-value:

Details are as follows.

Call by files

We design the metevalue.[DMR] function to accept similar parameter patterns:

metevalue.[DMR](
  methyrate,                # Output file name of [DMR]
  [DMR].output,             # Output file name of [DMR] with e-value of each region
  adjust.methods = "BH",    # Adjust methods of e-value
  sep = "\t",               # seperator, default is the TAB key.
  bheader = FALSE           # A logical value indicating whether the [DMR].output file
                            # contains the names of the variables as its first line
)

Here [DMR] coudle be one of methylKit, biseq, DMRfinder or metilene.

Call by R data frames

We provide the evalue_buildin_var_fmt_nm and varevalue.metilene function to handle the general DMR e-value alculation in DNA methylation studiesc:

# Here  `[DMR]` coudle be one of `methylKit`, `biseq`, `DMRfinder` or `metilene`.
method_in_use = "[DMR]"
result = evalue_buildin_var_fmt_nm(
          methyrate,              # Data frame of the methylation rate
          DMR_evalue_output,      # Data frame of output data corresponding to the
                                  # "method" option
          method = method_in_use) # DMR: "metilene", "biseq", "DMRfinder" or "methylKit"
result = list(a = result$a,
              b = result$b,
              a_b = evalue_buildin_sql(result$a, result$b, method = method_in_use))
result = varevalue.metilene(result$a, result$b, result$a_b)

Replace [DMR] to one of methylKit, biseq, DMRfinder or metilene accordingly.

Notice: for different [DMR], the data.frame schemas are different!!! Check the R help document for details. Check the Demo data section for details. # Examples

MethylKit Example

methylKit is a R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing.

Currently, metevalue package supports the e-value analytics of the methylKit output file.

library(metevalue)

####Simulation Data ####
set.seed(1234)

# methylKit is a Bioconductor package
library(methylKit)
file.list=list( system.file("extdata", 
                            "test1.myCpG.txt", package = "methylKit"),
                system.file("extdata",
                            "test2.myCpG.txt", package = "methylKit"),
                system.file("extdata", 
                            "control1.myCpG.txt", package = "methylKit"),
                system.file("extdata", 
                            "control2.myCpG.txt", package = "methylKit") )


# read the files to a methylRawList object: myobj
myobj=methRead(file.list,
               sample.id=list("test1","test2","ctrl1","ctrl2"),
               assembly="hg18",
               treatment=c(1,1,0,0),
               context="CpG"
)

meth=unite(myobj, destrand=FALSE)
meth.C <- getData(meth)[,seq(6,ncol(meth),3)]
meth.T <- getData(meth)[,seq(7,ncol(meth),3)]
mr <- meth.C/(meth.C + meth.T)
chr_pos = getData(meth)[,1:2]
methyrate = data.frame(chr_pos,mr)
names(methyrate) = c('chr', 'pos', rep('g1',2), rep('g2',2))
region<-tileMethylCounts(myobj)
meth<-unite(region,destrand=F)
myDiff<-calculateDiffMeth(meth)
met_all<-getMethylDiff(myDiff,type="all")

example_tempfiles = tempfile(c("rate_combine", "methylKit_DMR_raw"))
tempdir()
write.table(methyrate, file=example_tempfiles[1], row.names=F, col.names=T, quote=F, sep='\t')
write.table (met_all, file=example_tempfiles[2], sep ="\t", row.names =F, col.names =T, quote =F)

evalue.methylKit function could be used to tackle the problem.

result = metevalue.methylKit(example_tempfiles[1], example_tempfiles[2], bheader = T)
str(result)

Alternatively, one could use the build-in functions to derive functions which avoid the file operation:

result = evalue_buildin_var_fmt_nm(methyrate, met_all, method="methylKit")
result = list(a = result$a, 
              b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b, method="methylKit"))
result = varevalue.metilene(result$a, result$b, result$a_b)
str(result)

BiSeq Example

First, we load the methylation data at CpG site levels from ‘BiSeq’ package. Then we clustered CpG sites into DMRs using ‘BiSeq’.

library(BiSeq)
library(dplyr)
data(rrbs)
rrbs.rel <- rawToRel(rrbs)
methyrate <- methLevel(rrbs.rel)
methyrate <- data.frame(methyrate)
methyrateq = cbind(rows = as.numeric(row.names(methyrate)), methyrate)
methypos = data.frame(rows = as.numeric(row.names(methyrate)), rowRanges(rrbs))
methyrate = left_join(methypos, methyrateq)
methyrate = methyrate[,c(2,3,7:16)]
names(methyrate) <- c('chr','pos',rep('g1',5),rep('g2',5))

rrbs.clust.unlim <- clusterSites(object = rrbs,perc.samples = 3/4,min.sites = 20,max.dist = 100)

clusterSitesToGR(rrbs.clust.unlim)
ind.cov <- totalReads(rrbs.clust.unlim) > 0

quant <- quantile(totalReads(rrbs.clust.unlim)[ind.cov])
rrbs.clust.lim <- limitCov(rrbs.clust.unlim, maxCov = quant)
predictedMeth <- predictMeth(object = rrbs.clust.lim)

test<- predictedMeth[, colData(predictedMeth)$group == "test"]
control <- predictedMeth[, colData(predictedMeth)$group == "control"]
mean.test <- rowMeans(methLevel(test))
mean.control <- rowMeans(methLevel(control))

betaResults <- betaRegression(formula = ~group,link = "probit",object = predictedMeth,type = "BR")
vario <- makeVariogram(betaResults)
vario.sm <- smoothVariogram(vario, sill = 0.9)

locCor <- estLocCor(vario.sm)
clusters.rej <- testClusters(locCor)
clusters.trimmed <- trimClusters(clusters.rej)
DMRs <- findDMRs(clusters.trimmed,max.dist = 100,diff.dir = TRUE)


example_tempfiles = tempfile(c('rate_combine', 'BiSeq_DMR'))
write.table(methyrate, example_tempfiles[1], row.names=F, col.names=T, quote=F, sep='\t')
write.table(DMRs, example_tempfiles[2], quote=F, row.names = F,col.names = F, sep = '\t')

Finally, we added E-values and adjusted E-values as additional columns to the output file of ‘BiSeq’.BiSeq_evalue function could be used to tackle the problem.

hh <- data.frame(DMRs)
result = evalue_buildin_var_fmt_nm(rate_combine, hh, method="biseq")
result = list(a = result$a,  b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b,method="biseq"))
result = varevalue.metilene(result$a, result$b, result$a_b)
str(result)

DMRfinder Example

Given the input file

  • rate_combine_DMRfinder: a file containing methylation rates at each CpG site

  • DMRfinder_DMR: the output file from ‘DMRfinder’

rate_combine <- read.table("rate_combine_DMRfinder", header = T)
head(rate_combine)

DMRs <- read.table("DMRfinder_DMR", header = T)
head(DMRs)

Adding E-values and adjusted E-values as additional columns to file ‘DMRfinder_DMR’

result <- evalue.DMRfinder('rate_combine_DMRfinder', 'DMRfinder_DMR')
head(result)

Alternatively, function varevalue.metilene can also provides e-value and adjusted e-value.

result = evalue_buildin_var_fmt_nm(rate_combine, DMRs, method="DMRfinder")
result = list(a = result$a, 
              b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b, method="DMRfinder"))
result = varevalue.metilene(result$a, result$b, result$a_b)
head(result)

Metilene Example

Given

  • metilene.input: the input file of Metilene containing methylation rates at each CpG site
  • metilene.out: the output file of Metilene
input <- read.table("metilene.input", header = T)
head(input)

out <- read.table("metilene.out", header = F)
head(out)

Adding E-values and adjusted E-values as additional columns to metilene.out

result <- evalue.metilene('metilene.input', 'metilene.out')
head(result)

Alternatively, function varevalue.metilene can also provides e-value and adjusted e-value.

result = evalue_buildin_var_fmt_nm(input, out, method="metilene")
result = list(a = result$a, 
              b = result$b, 
              a_b = evalue_buildin_sql(result$a, result$b, method="metilene"))
result = varevalue.metilene(result$a, result$b, result$a_b)
head(result)

General case for DNA methylation data

In above examples, we have already provided examples to calculate E-values directly from DMR detection tools including BiSeq, DMRfinder, MethylKit and Metilene. However, users may wonder how to calculate the E-values directly from CpG sites or other DMR detection tools not presented above. We then facilitate the purpose in the following example.

  • metilene.input: the input file of Metilene containing methylation rates at each CpG site

By changing the group name, start site and end site, function varevalue.signle_general can calculate e-value of any site or region using a general methylation rates data. Take metilene.input as example.

input <- read.table("metilene.input", header = T)
e_value <- varevalue.signle_general(methyrate=input, group1_name='g1', group2_name='g2', chr='chr21', start=9439679, end=9439679)
head(e_value)

General case for RNA-seq data

The framework of E-value calculation presented in this project is also able to be extended to other genomic data including RNA-seq. Here is an example to introduce the E-value calculation in RNA-seq.

  • desq_out: the RNA data

function varevalue.signle_general can provides e-values for each row of the normalized expression level of RNA-seq data.

input <- read.table("desq_out", header = T)
data_e <- metevalue.RNA_general(input, group1_name='treated', group2_name='untreated')
head(data_e)

Misc

Demo data

Demo data for different metevalue.[DMR] functions are listed in the section.

Input Data Examples: MethylKit

methyrate Example

chr pos g1 g1 g2 g2
chr21 9853296 0.5882353 0.8048048 0.8888889 0.8632911
chr21 9853326 0.7058824 0.7591463 0.8750000 0.7493404

methylKit.output Example

chr start end strand pvalue qvalue meth.diff
chr21 9927001 9928000 * 0 0 -34.07557
chr21 9944001 9945000 * 0 0 -40.19089

Input Data Examples: BiSeq

methyrate Example

chr pos g1 g1 g1 g1 g1 g2 g2 g2 g2 g2
chr1 870425 0.8205128 1 0.7 NaN NaN 0.3125 0.7419355 0.2461538 0.1794872 0.2413793
chr1 870443 0.8461538 1 0.7 NaN NaN 0.3750 0.3225806 0.2923077 0.0512821 0.2413793

biseq.output Example

seqnames start end width strand median.p median.meth.group1 median.meth.group2 median.meth.diff
chr1 872369 872616 248 * 0.0753559 0.9385462 0.8666990 0.0710524
chr1 875227 875470 244 * 0.0000026 0.5136315 0.1991452 0.2942668

Input Data Examples: DMRfinder

methyrate Example

chr pos g1 g1.1 g2 g2.1
chr1 202833315 0 0.0000000 0 0
chr1 202833323 1 0.8095238 1 1

DMRfinder.output Example

chr start end CpG Control.mu Exptl.mu Control..Exptl.diff Control..Exptl.pval
chr8 25164078 25164102 3 0.9241646 0.7803819 -0.1437827 0.0333849
chr21 9437432 9437538 14 0.7216685 0.1215506 -0.6001179 0.0000000

Input Data Examples: metilene

metilene.input Example

chr pos g1 g1.1 g1.2 g1.3 g1.4 g1.5 g1.6 g1.7 g2 g2.1 g2.2 g2.3 g2.4 g2.5 g2.6 g2.7
chr21 9437433 0.9285714 NA 0.7222222 0.75 1 0.6666667 1 0.8695652 0.0000000 0 0 0 0.0000000 0.0 NA 0.00
chr21 9437445 1.0000000 NA 0.9444444 0.75 1 0.6666667 0 0.8695652 0.6111111 0 0 0 0.7333333 0.6 NA 0.75

metilene.output Example

chr start end q-value methyl.diff CpGs p p2 m1 m2
chr21 9437432 9437540 0 0.610989 26 0 0 0.73705 0.12606
chr21 9708982 9709189 0 0.475630 28 0 0 0.58862 0.11299

Input Data Examples: RNAseq

desq_out Example

treated1fb treated2fb treated3fb untreated1fb untreated2fb untreated3fb untreated4fb
4.449648 4.750104 4.431634 4.392285 4.497514 4.762213 4.533928
6.090031 5.973211 5.913239 6.238684 6.050743 5.932738 6.022005

Other Demos

Please vist the metevalue-emo project for more demos.