The useage of GeoTcgaData


Introduction

GEO and TCGA provide us with a wealth of data, such as RNA-seq, DNA Methylation, and Copy number variation data. It’s easy to download data from TCGA using the gdc tool, but processing these data into a format suitable for bioinformatics analysis requires more work. This R package was developed to handle these data.

library(GeoTcgaData)

Common operations on prioGene


#' This function can find the mean value of the gene in each module.
#'
#' param geneExpress a data.frame
#' param module a data.frame
#' param result a string
#'
#' @return a matrix, means the mean of gene expression value in
#' the same module
result <- cal_mean_module(geneExpress,module)

#' Average the expression data of different ids for the same gene in the chip
#' expression profile of GEO or TCGA
#'
#' param file1 a data.frame
#' param k a number
#'
#' @return a data.frame, the values of same genes in gene expression profile

#'
#' examples
aa <- c("Gene Symbol","MARCH1","MARC1","MARCH1","MARCH1","MARCH1")
bb <- c("GSM1629982","2.969058399","4.722410064","8.165514853","8.24243893","8.60815086")
cc <- c("GSM1629982","3.969058399","5.722410064","7.165514853","6.24243893","7.60815086")
file1 <- data.frame(aa=aa,bb=bb,cc=cc)
result <- gene_ave(file1)
#> Warning in gene_ave(file1): 强制改变过程中产生了NA

#> Warning in gene_ave(file1): 强制改变过程中产生了NA

#' Get the differentially expressioned genes using DESeq2 package 
#'
#' param profile a data.frame
#'
#' @return a data.frame, a intermediate results of DESeq2
#'
#' examples
profile2 <- classify_sample(profile)


#' Get the differentially expressioned genes using DESeq2 package
#'
#' param profile2 a result of classify_sample
#'
#' @return a matrix, information of differential expression genes
#'
#' examples
profile2 <- classify_sample(profile)
jieguo <- diff_gene(profile2)
#> Loading required namespace: DESeq2
#> please install the package DESeq2!

#' Merge methylation data downloaded from TCGA
#'
#' param dirr a string for the directory of methylation data download from tcga
#' useing the tools gdc
#' @return a matrix, a combined methylation expression spectrum matrix
#'
#' examples
mearge_result <- Merge_methy_tcga(system.file(file.path("extdata","methy"),package="GeoTcgaData"))
#> Warning in gene_ave(rep1_result): 强制改变过程中产生了NA
#> Warning in gene_ave(rep1_result): 强制改变过程中产生了NA

#' Combine clinical information obtained from TCGA and extract survival data
#'
#' param Files_dir1 a dir data
#'
#' @return a matrix, survival time and survival state in TCGA
#'
#' examples
tcga_cli_deal(system.file(file.path("extdata","tcga_cli"),package="GeoTcgaData"))
#> Warning in utils::read.table(file.path(Files_dir1, file), sep = "\t",
#> header = T): incomplete final line found by readTableHeader on 'C:/
#> Users/Administrator.MS-20180914RLKI/AppData/Local/Temp/RtmpWK9Cqj/
#> Rinst3c9c32f11d2a/GeoTcgaData/extdata/tcga_cli/at.orl.TCGA-2V-A95S.xml'
#> Warning in utils::read.table(file.path(Files_dir1, file), sep = "\t",
#> header = T): incomplete final line found by readTableHeader on 'C:/
#> Users/Administrator.MS-20180914RLKI/AppData/Local/Temp/RtmpWK9Cqj/
#> Rinst3c9c32f11d2a/GeoTcgaData/extdata/tcga_cli/ati.org.TCGA-2Y-A9GT.xml'
#> Warning in utils::read.table(file.path(Files_dir1, file), sep = "\t",
#> header = T): incomplete final line found by readTableHeader on 'C:/
#> Users/Administrator.MS-20180914RLKI/AppData/Local/Temp/RtmpWK9Cqj/
#> Rinst3c9c32f11d2a/GeoTcgaData/extdata/tcga_cli/ats.org.TCGA-2Y-A9GS.xml'
#>              V1   V2    V3
#> 1: TCGA-2V-A95S haha Alive
#> 2: TCGA-2Y-A9GT 1624  Dead
#> 3: TCGA-2Y-A9GS  724  Dead

#' Multiple genes symbols may correspond to a same id. Some people think 
#' that the expression value of this id should be 
#' given to each gene, and some people think that the expression value of
#' this id should be deleted. The result of rep1 is to assign the expression
#' of this id to each gene, and rep2 deletes the expression.
#'
#' param file1 input file, a data.frame or a matrixg
#' param string a string,sep of the gene
#'
#' return a data.frame,rep1 is to assign the expression
#' of this id to each gene, and rep2 deletes the expression.
#'
#' examples
aa <- c("MARCH1 /// MMA","MARC1","MARCH2 /// MARCH3",
        "MARCH3 /// MARCH4","MARCH1")
bb <- c("2.969058399","4.722410064","8.165514853","8.24243893","8.60815086")
cc <- c("3.969058399","5.722410064","7.165514853","6.24243893","7.60815086")
input_fil <- data.frame(aa=aa,bb=bb,cc=cc)
rep1_result <- rep1(input_fil," /// ")
rep1_result <- rep2(input_fil," /// ")

#' Convert  ENSEMBL gene id to gene Symbol in TCGA
#'
#' param profile a data.frame
#'
#' @return a data.frame, gene symbols and their expression value
#'
#' examples
result <- id_conversion(profile)