While coala
primiary focuses on simulation of data, it also offers to calculate summary statsitcs from real data. This is particular useful when comparing the statistics of real and simulated data.
Rather than offering function to import data directly, coala can convert the internal formats of other packages into its own format. Currently, only PopGenome
is supported, but we plan to also support ape
and pegas
in the future.
PopGenome offers functions for reading different data formats, including vcf
and fasta
. Please refer to its documentation for instructions. As example, we will read in one short fasta file that is included in coala:
suppressPackageStartupMessages(library(PopGenome))
fasta <- system.file("example_fasta_files", package = "coala")
data_pg <- readData(fasta, progress_bar_switch = FALSE)
data_pg <- set.outgroup(data_pg, c("Individual_Out-1", "Individual_Out-2"))
individuals <- list(paste0("Individual_1-", 1:5), paste0("Individual_2-", 1:5))
data_pg <- set.populations(data_pg, individuals)
We can now convert data_pg
using the as.segsites
function:
library(coala)
segsites <- as.segsites(data_pg)
Now, we can calculate summary statistics using calc_sumstats_from_data
:
model <- coal_model(c(5, 5, 2), 1, 25) +
feat_mutation(5) +
feat_outgroup(3) +
sumstat_sfs(population = 1)
stats <- calc_sumstats_from_data(model, segsites)
stats$sfs
## [1] 5 5 2 1
Alternatively, it is also possible to pass the data_pg
object directly to calc_sumstats_from_data
:
stats <- calc_sumstats_from_data(model, data_pg)
stats$sfs
## [1] 5 5 2 1
Please refer to the help pages for as.segsites
and calc_sumstats_from_data
for additional information.