Type: | Package |
Title: | Analysis of Codon Data under Stationarity using a Bayesian Framework |
Version: | 0.1.4.4 |
Date: | 2020-09-11 |
Author: | Authors@R |
Maintainer: | Cedric Landerer <cedric.landerer@gmail.com> |
URL: | https://github.com/clandere/AnaCoDa |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Depends: | R (≥ 3.3.0), Rcpp (≥ 0.11.3), VGAM, methods, mvtnorm |
Suggests: | knitr, Hmisc, coda, testthat, lmodel2, markdown |
RcppModules: | Test_mod, Trace_mod, CovarianceMatrix_mod, MCMCAlgorithm_mod, Model_mod, Parameter_mod, Genome_mod, Gene_mod, SequenceSummary_mod |
Description: | Is a collection of models to analyze genome scale codon data using a Bayesian framework. Provides visualization routines and checkpointing for model fittings. Currently published models to analyze gene data for selection on codon usage based on Ribosome Overhead Cost (ROC) are: ROC (Gilchrist et al. (2015) <doi:10.1093/gbe/evv087>), and ROC with phi (Wallace & Drummond (2013) <doi:10.1093/molbev/mst051>). In addition 'AnaCoDa' contains three currently unpublished models. The FONSE (First order approximation On NonSense Error) model analyzes gene data for selection on codon usage against of nonsense error rates. The PA (PAusing time) and PANSE (PAusing time + NonSense Error) models use ribosome footprinting data to analyze estimate ribosome pausing times with and without nonsense error rate from ribosome footprinting data. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
LinkingTo: | Rcpp |
LazyLoad: | yes |
LazyData: | yes |
RoxygenNote: | 7.1.1 |
Packaged: | 2020-09-15 09:04:26 UTC; landerer |
Repository: | CRAN |
Date/Publication: | 2020-09-15 09:40:19 UTC |
Amino Acid to codon set
Description
Converts one character amino acid code to the set of codon encoding that amino acid
Usage
AAToCodon(aa, focal = FALSE)
Arguments
aa |
Amino acid in single character notation |
focal |
logical, Include the alphabetically last (focal) codon |
Value
Returns the names of the codon encoding the give amino acid
See Also
Plots ACF for codon specific parameter traces
Description
The function calculates and by defaults plots the acf and estimates the autocorrelation in the trace
Usage
acfCSP(
parameter,
csp = "Mutation",
numMixtures = 1,
samples = NULL,
lag.max = 40,
plot = TRUE
)
Arguments
parameter |
object of class Parameter |
csp |
indicates which parameter to calculate the autocorrelation. Must be Mutation (the default, ROC, FONSE), Selection (ROC, FONSE), Alpha (PA, PANSE), LambdaPrime (PA, PANSE), NSERate (PA, PANSE)" |
numMixtures |
indicates the number of CSP mixtures used |
samples |
number of samples at the end of the trace used to calculate the acf |
lag.max |
Maximum amount of lag to calculate acf. Default is 10*log10(N), where N i the number of observations. |
plot |
logical. If TRUE (default) a plot of the acf is created |
See Also
Autocorrelation function for the likelihood or posterior trace
Description
The function calculates and by defaults plots the acf and estimates the autocorrelation in the trace.
Usage
acfMCMC(mcmc, type = "LogPosterior", samples = NULL, lag.max = 40, plot = TRUE)
Arguments
mcmc |
object of class MCMC |
type |
"LogPosterior" or "LogLikelihood", defaults to "LogPosterior" |
samples |
number of samples at the end of the trace used to calculate the acf |
lag.max |
Maximum amount of lag to calculate acf. Default is 10*log10(N), where N i the number of observations. |
plot |
logical. If TRUE (default) a plot of the acf is created |
See Also
Add gene observed synthesis rates
Description
addObservedSynthesisRateSet
returns the observed
synthesis rates of the genes within the genome specified.
Usage
addObservedSynthesisRateSet(
genome,
observed.expression.file,
match.expression.by.id = TRUE
)
Arguments
genome |
A genome object initialized with
|
observed.expression.file |
A string containing the location of a file containing empirical expression rates (optional). |
match.expression.by.id |
If TRUE (default) observed expression values will be assigned by matching sequence identifier. If FALSE observed expression values will be assigned by order |
Value
Returns the genome after adding the new gene expression values
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
## add expression values after the genome was initiallized,
## or adding an additional set of expression values
genome <- addObservedSynthesisRateSet(genome = genome,
observed.expression.file = expression_file)
Amino acids
Description
Returns a vector of all amino acids
Usage
aminoAcids()
Value
Returns a vector of all amino acids
See Also
Calculates the marginal log-likelihood for a set of parameters
Description
initializes the model object.
Usage
calculateMarginalLogLikelihood(
parameter,
mcmc,
mixture,
n.samples,
divisor,
warnings = TRUE
)
Arguments
parameter |
An object created with |
mcmc |
An object created with |
mixture |
determines for which mixture the marginal log-likelihood should be calculated |
n.samples |
How many samples should be used for the calculation |
divisor |
A value > 1 in order to scale down the tails of the importance distribution |
warnings |
Print warnings such as when the variance of a parameter is 0, which might occur when parameter is fixed |
Details
calculateMarginalLogLikelihood Calculate marginal log-likelihood for calculation of the Bayes factor using a generalized harmonix mean estimator of the marginal likelihood. See Gronau et al. (2017) for details
Value
This function returns the model object created.
Examples
## Not run:
# Calculate the log-marginal likelihood
parameter <- loadParameterObject("parameter.Rda")
mcmc <- loadMCMCObject("mcmc.Rda")
calculate_marginal_likelihood(parameter, mcmc, mixture = 1,
samples = 500, scaling = 1.5)
# Calculate the Bayes factor for two models
parameter1 <- loadParameterObject("parameter1.Rda")
parameter2 <- loadParameterObject("parameter2.Rda")
mcmc1 <- loadMCMCObject("mcmc1.Rda")
mcmc2 <- loadMCMCObject("mcmc2.Rda")
mll1 <- calculate_marginal_likelihood(parameter1, mcmc1, mixture = 1,
samples = 500, scaling = 1.5)
mll2 <- calculate_marginal_likelihood(parameter2, mcmc2, mixture = 1,
samples = 500, scaling = 1.5)
cat("Bayes factor: ", mll1 - mll2, "\n")
## End(Not run)
calculates the synonymous codon usage order (SCUO)
Description
calculateSCUO
calulates the SCUO value for each gene in genome. Note that if a codon is absent, this will be treated as NA and will be skipped in final calculation
Usage
calculateSCUO(genome)
Arguments
genome |
A genome object initialized with |
Value
returns the SCUO value for each gene in genome
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
scuo <- calculateSCUO(genome)
translates codon to amino acid
Description
Translates a given codon into the amino acid encoded by it.
Usage
codonToAA(codon)
Arguments
codon |
character, codon to translate |
Value
Returns the amino acid encoded by the given codon as character
See Also
Codons
Description
Returns a vector of all codons
Usage
codons()
Value
Returns a vector of all codons
See Also
Convergence Test
Description
Convergence Test
Usage
convergence.test(
object,
samples = 10,
frac1 = 0.1,
frac2 = 0.5,
thin = 1,
plot = FALSE,
what = "Mutation",
mixture = 1
)
Arguments
object |
an object of either class Trace or MCMC |
samples |
number of samples at the end of the trace used to determine convergence (< length of trace). Will use as starting point of convergence test. If the MCMC trace is of length x, then starting point for convergence test will be x - samples. |
frac1 |
fraction to use from beginning of samples |
frac2 |
fraction to use from end of samples |
thin |
the thinning interval between consecutive observations, which is used in creating a coda::mcmc object (according to the Coda documentation, users should specify if a MCMC chain has already been thinned using a the thin parameter). This does not further thin the data. |
plot |
(logical) plot result instead of returning an object |
what |
(for Trace Object only) which parameter to calculate convergence.test – current options are Selection, Mutation, MixtureProbability, Sphi, Mphi, ExpectedPhi, and AcceptanceCSP |
mixture |
(for Trace Object only) mixture for which to calculate convergence.test |
Details
Be aware that convergence.test for Trace objects works primarily for Trace objects from the ROC parameter class. Future updates will adapt this function to work for parameters from other models and expression traces
Value
Geweke score object evaluating whether means of two fractions (frac1 and frac2) differ. Convergence occurs when they don't differ significantly, i.e. pnorm(abs(convergence.test(mcmcObj)$a, ,lower.tail=FALSE)*2 > 0.05
Examples
## check for convergence after a run:
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
# check if posterior trace has converged
convergence.test(object = mcmc, samples = 500, plot = TRUE)
trace <- getTrace(parameter)
# check if Mutation trace has converged
convergence.test(object = trace, samples = 500, plot = TRUE, what = "Mutation")
# check if Sphi trace has converged
convergence.test(object = trace, samples = 500, plot = TRUE, what = "Sphi")
# check if ExpectedPhi trace has converged
convergence.test(object = trace, samples = 500, plot = TRUE, what = "ExpectedPhi")
## End(Not run)
Find and return list of optimal codons
Description
findOptimalCodon
extracrs the optimal codon for each amino acid.
Usage
findOptimalCodon(csp)
Arguments
csp |
a |
Value
A named list with with optimal codons for each amino acid.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- 1
numMixtures <- 1
geneAssignment <- rep(1, length(genome))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
model <- initializeModelObject(parameter = parameter, model = "ROC")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
csp_mat <- getCSPEstimates(parameter, CSP="Selection")
opt_codons <- findOptimalCodon(csp_mat)
## End(Not run)
fixDEta
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Fix the value of selection its current value
fixDM
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Fix the value of mutation its current value
fixSphi
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Fix the value of s_phi (standard deviation of lognormal for synthesis rates) at its current value
Take the geometric mean of a vector
Description
geomMean
will calculate the geometric mean of a list of numerical values.
Usage
geomMean(x, rm.invalid = TRUE, default = 1e-05)
Arguments
x |
A vector of numerical . |
rm.invalid |
Boolean value for handling 0, negative, or NA values in the vector. Default is TRUE and will not
include these values in the calculation. If FALSE, these values will be replaced by the value give to |
default |
Numerical value that serves as the value to replace 0, negative, or NA values in the calculation when rm.invalid is FALSE. Default is 1e-5. |
Details
This function is a special version of the geometric mean specifically for AnaCoda.
Most models in Anacoda assume a log normal distribution for phi values, thus all values in x
are expectd to be positive.
geomMean returns the geometric mean of a vector and can handle 0, negative, or NA values.
Value
Returns the geometric mean of a vector.
Examples
x <- c(1, 2, 3, 4)
geomMean(x)
y<- c(1, NA, 3, 4, 0, -1)
# Only take the mean of non-Na values greater than 0
geomMean(y)
# Replace values <= 0 or NAs with a default value 0.001 and then take the mean
geomMean(y, rm.invalid = FALSE, default = 0.001)
getAdaptiveWidth
Description
Return sample adaptiveWidth value, which is the number of samples (not iterations) between adapting parameter proposal widths
Value
number of sample steps between adapting proposal widths
Calculate the Codon Adaptation Index
Description
getCAI
returns the Codon Adaptation Index for a
genome based on a provided reference.
Usage
getCAI(referenceGenome, testGenome, default.weight = 0.5)
Arguments
referenceGenome |
A genome object initialized with |
testGenome |
A genome object initialized with |
default.weight |
Default weight to use if codon is missing from referenceGenome |
Value
Returns a named vector with the CAI for each gene
Examples
genome_file1 <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa")
genome_file2 <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
referenceGenome <- initializeGenomeObject(file = genome_file1)
testGenome <- initializeGenomeObject(file = genome_file2)
cai <- getCAI(referenceGenome, testGenome)
Calculate the CAI codon weigths for a reference genome
Description
getCAIweights
returns the weights for the Codon Adaptation Index
based on a reference genome.
Usage
getCAIweights(referenceGenome, default.weight = 0.5)
Arguments
referenceGenome |
A genome object initialized with |
default.weight |
Set default weight for any codon not observed in the reference genome |
Value
Returns a named vector with the CAI weights for each codon
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
referenceGenome <- initializeGenomeObject(file = genome_file)
wi <- getCAIweights(referenceGenome)
Return Codon Specific Paramters (or write to csv) estimates as data.frame
Description
getCSPEstimates
returns the codon specific
parameter estimates for a given parameter and mixture or write it to a csv file.
Usage
getCSPEstimates(
parameter,
filename = NULL,
mixture = 1,
samples = 10,
relative.to.optimal.codon = T,
report.original.ref = T,
log.scale = F
)
Arguments
parameter |
parameter an object created by |
filename |
Posterior estimates will be written to file (format: csv). Filename will be in the format <parameter_name>_<filename>.csv. |
mixture |
estimates for which mixture should be returned |
samples |
The number of samples used for the posterior estimates. |
relative.to.optimal.codon |
Boolean determining if parameters should be relative to the preferred codon or the alphabetically last codon (Default=TRUE). Only applies to ROC and FONSE models |
report.original.ref |
Include the original reference codon (Default = TRUE). Note this is only included for the purposes of simulations, which expect the input parameter file to be in a specific format. Later version of AnaCoDa will remove this. |
log.scale |
Calculate posterior means, standard deviation, and posterior probability intervals on the natural log scale. Should be used for PA and PANSE models only. |
Value
returns a list data.frame with the posterior estimates of the models
codon specific parameters or writes it directly to a csv file if filename
is specified
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
model <- initializeModelObject(parameter = parameter, model = "ROC")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
## return estimates for codon specific parameters
csp_mat <- getCSPEstimates(parameter)
# write the result directly to the filesystem as a csv file. No values are returned
getCSPEstimates(parameter, filename=file.path(tempdir(), "test.csv"))
## End(Not run)
Get Codon Counts For all Amino Acids
Description
provides the codon counts for a fiven amino acid across all genes
Usage
getCodonCounts(genome)
Arguments
genome |
A genome object from which the counts of each codon can be obtained. |
Details
The returned matrix containes a row for each gene and a column
for each synonymous codon of aa
.
Value
Returns a data.frame storing the codon counts for each amino acid.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
counts <- getCodonCounts(genome)
Get Codon Counts For a specific Amino Acid
Description
provides the codon counts for a fiven amino acid across all genes
Usage
getCodonCountsForAA(aa, genome)
Arguments
aa |
One letter code of the amino acid for which the codon counts should be returned |
genome |
A genome object from which the counts of each codon can be obtained. |
Details
The returned matrix containes a row for each gene and a coloumn
for each synonymous codon of aa
.
Value
Returns a data.frame storing the codon counts for the specified amino acid.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
counts <- getCodonCountsForAA("A", genome)
getCodonSpecificPosteriorMeanForCodon
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate codon-specific parameter (CSP) posterior mean
Arguments
mixtureElement |
mixture to calculate CSP posterior mean. Should be between 1 and n, where n is number of mixtures. |
samples |
number of samples to use for calculating posterior mean |
codon |
codon to calculate CSP |
paramType |
CSP to calculate posterior mean for. 0: Mutation (ROC,FONSE) or Alpha (PA, PANSE). 1: Selection (ROC,FONSE), Lambda (PANSE), Lambda^prime (PA). 2: NSERate (PANSE) |
withoutReference |
If model uses reference codon, then ignore this codon (fixed at 0). Should be TRUE for ROC and FONSE. Should be FALSE for PA and PANSE. |
log_scale |
If true, calculate posterior mean on log scale. Should only be used for PA and PANSE. |
Value
posterior mean value for CSP
getCodonSpecificPosteriorVarianceForCodon
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate codon-specific parameter (CSP) variance
Arguments
mixtureElement |
mixture to calculate CSP variance. Should be between 1 and n, where n is number of mixtures. |
samples |
number of samples to use for calculating variance |
codon |
codon to calculate CSP |
paramType |
CSP to calculate variance for. 0: Mutation (ROC,FONSE) or Alpha (PA, PANSE). 1: Selection (ROC,FONSE), Lambda (PANSE), Lambda^prime (PA). 2: NSERate (PANSE) |
unbiased |
If TRUE, should calculate variance using unbiased (N-1). Otherwise, used biased (N) correction |
withoutReference |
If model uses reference codon, then ignore this codon (fixed at 0). Should be TRUE for ROC and FONSE. Should be FALSE for PA and PANSE. |
log_scale |
If true, calculate posterior mean on log scale. Should only be used for PA and PANSE. |
Value
variance over trace for CSP
getCodonSpecificQuantilesForCodon
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate quantiles of CSP traces
Arguments
mixtureElement |
mixture to calculate CSP variance. Should be between 1 and n, where n is number of mixtures. |
samples |
number of samples to use for calculating variance |
codon |
codon to calculate CSP |
paramType |
CSP to calculate variance for. 0: Mutation (ROC,FONSE) or Alpha (PA, PANSE). 1: Selection (ROC,FONSE), Lambda (PANSE), Lambda^prime (PA). 2: NSERate (PANSE) |
probs |
vector of two doubles between 0 and 1 indicating range over which to calculate quantiles. <0.0275, 0.975> would give 95% quantiles. |
withoutReference |
If model uses reference codon, then ignore this codon (fixed at 0). Should be TRUE for ROC and FONSE. Should be FALSE for PA and PANSE. |
log_scale |
If true, calculate posterior mean on log scale. Should only be used for PA and PANSE. |
Value
vector representing lower and upper bound of quantile
getEstimatedMixtureAssignmentForGene
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get estimated mixture assignment for gene
Arguments
samples |
number of samples over which to calculate mixture assignment |
geneIndex |
corresponding index of gene in genome. Should be a number between 1 and length(genome). |
Value
returns value between 1 and n, where n is number of mixtures
getEstimatedMixtureAssignmentProbabilitiesForGene
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get estimated mixture assignment probabilities for gene
Arguments
samples |
number of samples over which to calculate mixture assignment probabilities |
geneIndex |
corresponding index of gene in genome. Should be a number between 1 and length(genome). |
Value
returns vector of probabilities representing mixture probabilities for gene
Returns the estimated phi posterior for a gene
Description
Posterior estimates for the phi value of specified genes
Usage
getExpressionEstimates(
parameter,
gene.index,
samples,
quantiles = c(0.025, 0.975),
genome = NULL
)
Arguments
parameter |
on object created by |
gene.index |
a integer or vector of integers representing the gene(s) of interesst. |
samples |
number of samples for the posterior estimate |
quantiles |
vector of quantiles, (default: c(0.025, 0.975)) |
genome |
if genome is given, then will include gene ids in output (default is NULL) |
Details
The returned vector is unnamed as gene ids are only stored in the genome
object,
but the gene.index
vector can be used to match the assignment to the genome.
Value
returns a vector with the mixture assignment of each gene corresbonding to gene.index
in the same order as the genome.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
model <- initializeModelObject(parameter = parameter, model = "ROC")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
# get the estimated expression values for all genes based on the mixture
# they are assigned to at each step
estimatedExpression <- getExpressionEstimates(parameter, 1:length(genome), 1000)
## End(Not run)
getGroupList
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get amino acids (ROC, FONSE) or codons (PA, PANSE) for which parameters will be estimated
Value
returns list of amino acids or codons
getLogLikelihoodTrace
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Returns the logLikelihood trace
Value
vector representing logLikelihood trace
getLogPosteriorMean
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Calculate the mean log posterior probability over the last n samples
Arguments
samples |
postive value less than total length of the MCMC trace |
Value
mean logPosterior
getLogPosteriorTrace
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Returns the logPosterior trace
Value
vector representing logPosterior trace
Returns mixture assignment estimates for each gene
Description
Posterior estimates for the mixture assignment of specified genes
Usage
getMixtureAssignmentEstimate(parameter, gene.index, samples)
Arguments
parameter |
on object created by |
gene.index |
a integer or vector of integers representing the gene(s) of interesst. |
samples |
number of samples for the posterior estimate |
Details
The returned vector is unnamed as gene ids are only stored in the genome
object,
but the gene.index
vector can be used to match the assignment to the genome.
Value
returns a vector with the mixture assignment of each gene corresbonding to gene.index
in the same order as the genome.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
model <- initializeModelObject(parameter = parameter, model = "ROC")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth,
est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
# get the mixture assignment for all genes
mixAssign <- getMixtureAssignmentEstimate(parameter = parameter,
gene.index = 1:length(genome), samples = 1000)
# get the mixture assignment for a subsample
mixAssign <- getMixtureAssignmentEstimate(parameter = parameter,
gene.index = 5:100, samples = 1000)
# or
mixAssign <- getMixtureAssignmentEstimate(parameter = parameter,
gene.index = c(10, 30:50, 3, 90), samples = 1000)
## End(Not run)
Gene Names of Genome
Description
returns the identifiers of the genes within the genome specified.
Usage
getNames(genome, simulated = FALSE)
Arguments
genome |
A genome object initialized with |
simulated |
A logical value denoting if the gene names to be listed are simulated or not. The default value is FALSE. |
Value
gene.names Returns the names of the genes as a vector of strings.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
## return all gene ids for the genome
geneIDs <- getNames(genome, FALSE)
Calculate the Effective Number of Codons
Description
getNc
returns the Effective Number of Codons for a genome.
Usage
getNc(genome)
Arguments
genome |
A genome object initialized with |
Value
Returns a named vector with the Effective Number of Codons for each gene
Examples
genome_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
nc <- getNc(genome)
Calculate the Effective Number of Codons for each Amino Acid
Description
getNcAA
returns the Effective Number of Codons for each Amino Acid.
Usage
getNcAA(genome)
Arguments
genome |
A genome object initialized with |
Value
Returns an object of type data.frame
with the Effective Number of Codons
for each amino acid in each gene.
Examples
genome_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
nc <- getNcAA(genome)
getNoiseOffsetPosteriorMean
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate posterior mean of standard deviation parameter of lognormal describing distribution of synthesis rates
Arguments
index |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
samples |
number of samples over which to calculate posterior mean |
Value
returns posterior mean for standard deviation of lognormal distribution of synthesis rates
getNoiseOffsetVariance
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate variance of noise offset parameter used when fitting model with empirical estimates of synthesis rates (ie. withPhi fits)
Arguments
index |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
samples |
number of samples over which to calculate variance |
unbiased |
If TRUE, should calculate variance using unbiased (N-1). Otherwise, used biased (N) correction |
Value
returns variance for noise offset
Get gene observed synthesis rates
Description
getObservedSynthesisRateSet
returns the observed
synthesis rates of the genes within the genome specified.
Usage
getObservedSynthesisRateSet(genome, simulated = FALSE)
Arguments
genome |
A genome object initialized with |
simulated |
A logical value denoting if the synthesis rates to be listed are simulated or not. The default value is FALSE. |
Value
Returns a data.frame with the observed expression values in genome
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
## return expression values as a data.frame with gene ids in the first column.
expressionValues <- getObservedSynthesisRateSet(genome = genome)
getSamples
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Return number of samples set for MCMCAlgorithm object
Value
number of samples used during MCMC
Calculate Selection coefficients
Description
getSelectionCoefficients
calculates the selection coefficient of each codon in each gene.
Usage
getSelectionCoefficients(genome, parameter, samples = 100)
Arguments
genome |
A genome object initialized with
|
parameter |
an object created by |
samples |
The number of samples used for the posterior estimates. |
Value
A matrix with selection coefficients.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- 1
numMixtures <- 1
geneAssignment <- rep(1, length(genome))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
model <- initializeModelObject(parameter = parameter, model = "ROC")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
## return estimates for selection coefficients s for each codon in each gene
selection.coefficients <- getSelectionCoefficients(genome = genome,
parameter = parameter, samples = 1000)
## End(Not run)
getStdDevSynthesisRatePosteriorMean
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate posterior mean of standard deviation parameter of lognormal describing distribution of synthesis rates
Arguments
samples |
number of samples over which to calculate posterior mean |
mixture |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
Value
returns posterior mean for standard deviation of lognormal distribution of synthesis rates
getStdDevSynthesisRateVariance
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Calculate variance of standard deviation parameter of lognormal describing distribution of synthesis rates
Arguments
samples |
number of samples over which to calculate variance |
mixture |
mixture index to use. Should be number between 0 and n-1, where n is number of mixtures |
unbiased |
If TRUE, should calculate variance using unbiased (N-1). Otherwise, used biased (N) correction |
Value
returns variance for standard deviation of lognormal distribution of synthesis rates
getStepsToAdapt
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Return number of iterations (total iterations = samples * thinning) to allow proposal widths to adapt
Value
number of sample steps to adapt
getSynthesisRate
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get current synthesis rates for all genes and all mixtures
Value
2 by 2 vector of numeric values
getSynthesisRatePosteriorMeanForGene
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get posterior mean synthesis rate value for a gene
Arguments
samples |
number of samples over which to calculate mean |
geneIndex |
corresponding index of gene in genome for which posterior mean synthesis rate will be calculated. Should be a number between 1 and length(genome) |
log_scale |
Calculate posterior mean on log scale |
Value
posterior mean synthesis rate for gene
getSynthesisRatePosteriorVarianceForGene
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get synthesis rate variance for a gene
Arguments
samples |
number of samples over which to calculate variance |
geneIndex |
corresponding index of gene in genome for which synthesis rate variance will be calculated. Should be a number between 1 and length(genome) |
unbiased |
Should calculate variance using unbiased (N-1) or biased (N) correction |
log_scale |
Calculate variance on log scale |
Value
posterior mean synthesis rate for gene
getThinning
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Return thinning value, which is the number of iterations (total iterations = samples * thinning) not being kept
Value
thinning value used during MCMC
extracts an object of traces from a parameter object.
Description
extracts an object of traces from a parameter object.
Usage
getTrace(parameter)
Arguments
parameter |
A Parameter object that corresponds to one of the model types. |
Value
trace Returns an object of type Trace extracted from the given parameter object
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
trace <- getTrace(parameter) # empty trace object since no MCMC was perfomed
getTraceObject
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Get Trace object stored by a Parameter object. Useful for plotting certain parameter traces.
Value
Trace object
initMutationCategories
Description
Initialize values for mutation CSP. File should be of comma-separated with header. Three columns should be of order Amino_acid,Codon,Value
Arguments
files |
list of files containing starting values. Number of files should equal the number of categories. |
numCategories |
number of mutation categories (should be less than or equal to number of mixtures) |
fix |
Can use this parameter to fix mutation at current values (won't change over course of MCMC run) |
initSelectionCategories
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize values for selection CSP. File should be of comma-separated with header. Three columns should be of order Amino_acid,Codon,Value
Arguments
files |
list of files containing starting values. Number of files should equal the number of categories. |
numCategories |
number of mutation categories (should be less than or equal to number of mixtures) |
fix |
Can use this parameter to fix selection at current values (won't change over course of MCMC run) |
Initialize Covariance Matrices
Description
Initialize Covariance Matrices
Usage
initializeCovarianceMatrices(
parameter,
genome,
numMixtures,
geneAssignment,
init.csp.variance = 0.0025
)
Arguments
parameter |
A Parameter object that corresponds to one of the model types. Valid values are "ROC", "PA", and "FONSE". |
genome |
An object of type Genome necessary for the initialization of the Parameter object. |
numMixtures |
The number of mixture elements for the underlying mixture distribution (numMixtures > 0). |
geneAssignment |
A vector holding the initial mixture assignment for each gene. The vector length has to equal the number of genes in the genome. Valid values for the vector range from 1 to numMixtures. It is possible but not advised to leave a mixture element empty. |
init.csp.variance |
initial proposal variance for codon specific parameter, default is 0.0025. |
Value
parameter Returns the Parameter argument, now modified with initialized mutation, selection, and covariance matrices.
Genome Initialization
Description
initializeGenomeObject
initializes the Rcpp Genome object
Usage
initializeGenomeObject(
file,
genome = NULL,
observed.expression.file = NULL,
fasta = TRUE,
positional = FALSE,
match.expression.by.id = TRUE,
append = FALSE
)
Arguments
file |
A file of coding sequences in fasta or RFPData format |
genome |
A genome object can be passed in to concatenate the input file to it (optional). |
observed.expression.file |
String containing the location of a file containing empirical expression rates (optional). Default value is NULL. |
fasta |
Boolean value indicating whether |
positional |
Boolean indicating if the positional information in the RFPData file is necessary. Default value is FALSE |
match.expression.by.id |
If TRUE, observed expression values will be assigned by matching sequence identifier. If FALSE, observed expression values will be assigned by order. Default value is TRUE. |
append |
If TRUE, function will read in additional genome data to append to an existing genome. If FALSE, genome data is cleared before reading in data (no preexisting data). Default value is FALSE. |
Value
This function returns the initialized Genome object.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genes_file <- system.file("extdata", "more_genes.fasta", package = "AnaCoDa")
expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
## reading genome and observed expression data
genome <- initializeGenomeObject(file = genome_file, observed.expression.file = expression_file)
## add aditional genes to existing genome
genome <- initializeGenomeObject(file = genome_file)
genome <- initializeGenomeObject(file = genes_file, genome = genome, append = TRUE)
Initialize MCMC
Description
initializeMCMCObject
initializes a MCMC object to
perform a model fitting for a parameter and model object.
Usage
initializeMCMCObject(
samples,
thinning = 1,
adaptive.width = 100,
est.expression = TRUE,
est.csp = TRUE,
est.hyper = TRUE,
est.mix = TRUE
)
Arguments
samples |
Number of samples to be produced when running the MCMC algorithm. No default value. |
thinning |
The thinning interval between consecutive observations. If set to 1, every step will be saved as a sample. Default value is 1. |
adaptive.width |
Number that determines how often the acceptance/rejection
window should be altered. Default value is 100 samples.
Proportion of MCMC steps where the proposal distribution is adaptive can be set using |
est.expression |
Boolean that tells whether or not synthesis rate values should be estimated in the MCMC algorithm run. Default value is TRUE. |
est.csp |
Boolean that tells whether or not codon specific values should be estimated in the MCMC algorithm run. Default value is TRUE. |
est.hyper |
Boolean that tells whether or not hyper parameters
should be estimated in the MCMC algorithm run. Default value is TRUE.
Setting for expression noise parameter sepsilon can be overridden by setting |
est.mix |
Boolean that tells whether or not the genes' mixture element should be estimated in the MCMC algorithm run. Default value is TRUE. |
Details
initializeMCMCObject
sets up the MCMC object
(monte carlo markov chain) and returns the object so a model fitting can be done.
It is important to note that est.expression and est.hyper will affect one another
negatively if their values differ.
Value
mcmc Returns an intialized MCMC object.
Examples
## initializing an object of type mcmc
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
## estimate all parameter types
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth,
est.expression=TRUE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
## do not estimate expression values, initial conditions will remain constant
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth,
est.expression=FALSE, est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
## do not estimate hyper parameters, initial conditions will remain constant
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning, adaptive.width=adaptiveWidth,
est.expression=TRUE, est.csp=TRUE, est.hyper=FALSE, est.mix = TRUE)
Model Initialization
Description
initializes the model object.
Usage
initializeModelObject(
parameter,
model = "ROC",
with.phi = FALSE,
fix.observation.noise = FALSE,
rfp.count.column = 1
)
Arguments
parameter |
An object created with |
model |
A string containing the model to run (ROC, FONSE, or PA), has to match parameter object. |
with.phi |
(ROC only) A boolean that determines whether or not to include empirical phi values (expression rates) for the calculations. Default value is FALSE |
fix.observation.noise |
(ROC only) Allows fixing the noise term sepsilon in the observed expression dataset to its initial condition. This value should override the est.hyper=TRUE setting in |
rfp.count.column |
(PA and PANSE only) A number representing the RFP count column to use. Default value is 1. |
Details
initializeModelObject initializes a model. The type of model is determined based on the string passed to the model
argument.
The Parameter object has to match the model that is initialized. E.g. to initialize a ROC model,
it is required that a ROC parameter object is passed to the function.
Value
This function returns the model object created.
Examples
#initializing a model object
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
expression_file <- system.file("extdata", "expression.csv", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file,
observed.expression.file = expression_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
# initializing a model object assuming we have observed expression (phi)
# values stored in the genome object.
initializeModelObject(parameter = parameter, model = "ROC", with.phi = TRUE)
# initializing a model object ignoring observed expression (phi)
# values stored in the genome object.
initializeModelObject(parameter = parameter, model = "ROC", with.phi = FALSE)
Initialize Parameter
Description
initializeParameterObject
initializes a new parameter object or reconstructs one from a restart file
Usage
initializeParameterObject(
genome = NULL,
sphi = NULL,
num.mixtures = 1,
gene.assignment = NULL,
initial.expression.values = NULL,
model = "ROC",
split.serine = TRUE,
mixture.definition = "allUnique",
mixture.definition.matrix = NULL,
init.with.restart.file = NULL,
mutation.prior.mean = 0,
mutation.prior.sd = 0.35,
propose.by.prior = FALSE,
init.csp.variance = 0.0025,
init.sepsilon = 0.1,
init.w.obs.phi = FALSE,
init.initiation.cost = 4,
init.partition.function = 1
)
Arguments
genome |
An object of type Genome necessary for the initialization of the Parameter object. The default value is NULL. |
sphi |
Initial values for sphi. Expected is a vector of length numMixtures. The default value is NULL. |
num.mixtures |
The number of mixtures elements for the underlying mixture distribution (numMixtures > 0). The default value is 1. |
gene.assignment |
A vector holding the initial mixture assignment for each gene. The vector length has to equal the number of genes in the genome. Valid values for the vector range from 1 to numMixtures. It is possible but not advised to leave a mixture element empty. The default Value is NULL. |
initial.expression.values |
(Optional) A vector with intial phi values. The length of the vector has to equal the number of genes in the Genome object and the order of the genes should match the order of the genes in the Genome. The default value is NULL. |
model |
Specifies the model used. Valid options are "ROC", "PA", "PANSE", or "FONSE". The default model is "ROC". ROC is described in Gilchrist et al. 2015. PA, PANSE and FONSE are currently unpublished. |
split.serine |
Whether serine should be considered as one or two amino acids when running the model. TRUE and FALSE are the only valid values. The default value for split.serine is TRUE. |
mixture.definition |
A string describing how each mixture should be treated with respect to mutation and selection. Valid values consist of "allUnique", "mutationShared", and "selectionShared". The default value for mixture.definition is "allUnique". See details for more information. |
mixture.definition.matrix |
A matrix representation of how the mutation and selection categories correspond to the mixtures. The default value for mixture.definition.matrix is NULL. If provided, the model will use the matrix to initialize the mutation and selection categories instead of the definition listed directly above. See details for more information. |
init.with.restart.file |
File name containing information to reinitialize a previous Parameter object. If given, all other arguments will be ignored. The default value for init.with.restart.file is NULL. |
mutation.prior.mean |
Controlling the mean of the normal prior on mutation paramters. If passed in as single number (default is 0), this will be the mean value for all categories, for all codons. User may also supply a vector with n * 40 values, where n is the number of mutation categories. Future versions will check the number of rows matches the number of mutation categories definded by user. |
mutation.prior.sd |
Controlling the standard deviation of the normal prior on the mutation parameters. If passed in as single number (default is 0.35), this will be the standard deviation value for all categories, for all codons. User may also supply a vector with n * 40 values, where n is the number of mutation categories. Future versions will check the number of rows matches the number of mutation categories definded by user. |
propose.by.prior |
Mutation bias parameters will be proposed based on the means and standard deviations set in mutation.prior.mean and mutation.prior.sd |
init.csp.variance |
specifies the initial proposal width for codon specific parameter (default is 0.0025). The proposal width adapts during the runtime to reach a taget acceptance rate of ~0.25 |
init.sepsilon |
specifies the initial value for sepsilon. default is 0.1 |
init.w.obs.phi |
If TRUE, initialize phi values with observed phi values (data from RNAseq, mass spectrometry, ribosome footprinting) Default is FALSE. If multiple observed phi values exist for a gene, the geometric mean of these values is used as initial phi. When using this function, one should remove any genes with missing phi values, as these genes will not have an initial phi value. |
init.initiation.cost |
FOR FONSE ONLY. Initializes the initiation cost a_1 at this value. |
init.partition.function |
FOR PANSE ONLY. initializes the partition function Z. |
Details
initializeParameterObject
checks the values of the arguments
given to insure the values are valid.
The mixture definition and mixture definition matrix describes how the mutation
and selection categories are set up with respect to the number of mixtures. For
example, if mixture.definition = "allUnique" and numMixtures = 3, a matrix
representation would be matrix(c(1,2,3,1,2,3), ncol=2)
where each row represents a mixture, the first column represents the mutation
category, and the second column represents the selection category.
Another example would be mixture.definition = "selectionShared" and numMixtures = 4 (
matrix(c(1,2,3,4,1,1,1,1), ncol=2)
).
In this case, the selection category is the same for every mixture. If a matrix
is given, and it is valid, then the mutation/selection relationship will be
defined by the given matrix and the keyword will be ignored. A matrix should only
be given in cases where the keywords would not create the desired matrix.
Value
parameter Returns an initialized Parameter object.
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
restart_file <- system.file("extdata", "restart_file.rst", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
## initialize a new parameter object
sphi_init <- 1
numMixtures <- 1
geneAssignment <- rep(1, length(genome))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
## re-initialize a parameter object from a restart file. Useful for checkpointing
parameter <- initializeParameterObject(init.with.restart.file = restart_file)
## initialize a parameter object with a custon mixture definition matrix
def.matrix <- matrix(c(1,1,1,2), ncol=2)
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = c(0.5, 2), num.mixtures = 2,
gene.assignment = geneAssignment,
mixture.definition.matrix = def.matrix)
initializeSynthesisRateByGenome
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize synthesis rates using SCUO values calcuated from the genome
Arguments
genome |
a Genome object |
initializeSynthesisRateByList
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize synthesis rates with values passed in as a list
Arguments
expression |
a list of values to use as initial synthesis rate values. Should be same size as number of genes in genome. |
initializeSynthesisRateByRandom
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Initialize synthesis rates by drawing a from a lognormal distribution with mean = -(sd_phi)^2/2 and sd = sd_phi
Arguments
sd_phi |
a positive value which will be the standard deviation of the lognormal distribution |
Length of Genome
Description
length
gives the length of a genome
Usage
## S3 method for class 'Rcpp_Genome'
length(x)
Arguments
x |
A genome object initialized with |
Value
returns the number of genes in a genome
Examples
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
## reading genome
genome <- initializeGenomeObject(file = genome_file)
length(genome) # 10
Load MCMC Object
Description
loadMCMCObject
creates a new MCMC object and fills it with
the information in the file given.
Usage
loadMCMCObject(files)
Arguments
files |
The filenames where the data will be stored. |
Details
This MCMC object is not intended to be used to do another model fitting, only to graph the stored results.
Value
This function has no return value.
Examples
## loading mcmc objects from the filesystem
## Not run:
# load one mcmc object
mcmc <- loadMCMCObject(files = "mcmc.Rda")
# load and combine multiple mcmc objects. Useful when using checkpointing
mcmc <- loadMCMCObject(files = c("mcmc1.Rda", "mcmc2.Rda"))
## End(Not run)
Load Parameter Object
Description
loadParameterObject
will load a parameter object from the filesystem
Usage
loadParameterObject(files)
Arguments
files |
A list of parameter filenames to be loaded. If multiple files are given, the parameter objects will be concatenated in the order provided |
Details
The function loads one or multiple files. In the case of multiple file, e.g. due to the use of check pointing, the files will be concatenated to one parameter object. See writeParameterObject for the writing of parameter objects
Value
Returns an initialized Parameter object.
Examples
## Not run:
# load a single parameter object
parameter <- loadParameterObject("parameter.Rda")
# load and concatenate multiple parameter object
parameter <- loadParameterObject(c("parameter1.Rda", "parameter2.Rda"))
## End(Not run)
Plot Model Object
Description
Plots traces from the model object such as synthesis rates for each gene. Will work regardless of whether or not expression/synthesis rate levels are being estimated. If you wish to plot observed/empirical values, these values MUST be set using the initial.expression.values parameter found in initializeParameterObject. Otherwise, the expression values plotted will just be SCUO values estimated upon initialization of the Parameter object.
Usage
## S3 method for class 'Rcpp_FONSEModel'
plot(
x,
genome,
samples = 100,
mixture = 1,
simulated = FALSE,
codon.window = NULL,
...
)
Arguments
x |
An Rcpp model object initialized with |
genome |
An Rcpp genome object initialized with |
samples |
The number of samples in the trace |
mixture |
The mixture for which to graph values. |
simulated |
A boolean value that determines whether to use the simulated genome. |
codon.window |
A boolean value that determines the codon window to use for calculating codon frequencies. If NULL (the default), use complete sequences. |
... |
Optional, additional arguments. For this function, a possible title for the plot in the form of a list if set with "main". |
Value
This function has no return value.
Plot Parameter
Description
plot
graphs the mutation or selection parameter for a ROC or FONSE
parameter object for each mixture element.
Usage
## S3 method for class 'Rcpp_FONSEParameter'
plot(
x,
what = "Mutation",
samples = 100,
mixture.name = NULL,
with.ci = TRUE,
...
)
Arguments
x |
A parameter object |
what |
Which aspect of the parameter to plot. Default value is "Mutation". |
samples |
Number of samples to plot using the posterior mean. Default value is 100. |
mixture.name |
a vector with names/descriptions of the mixture distributions in the parameter object |
with.ci |
Plot with or without confidence intervals. Default value is TRUE |
... |
Arguments to be passed to methods, such as graphical parameters. |
Details
Graphs are based off the last # samples for the posterior mean.
Value
This function has no return value.
Plot MCMC algorithm
Description
This function will plot the logLikelihood trace, and if the Hmisc package is installed, it will plot a subplot of the logLikelihood trace with the first few samples removed.
Usage
## S3 method for class 'Rcpp_MCMCAlgorithm'
plot(x, what = "LogPosterior", zoom.window = NULL, ...)
Arguments
x |
An Rcpp_MCMC object initialized with |
what |
character defining if log(Posterior) (Default) or log(Likelihood) options are: LogPosterior or logLikelihood |
zoom.window |
A vector describing the start and end of the zoom window. |
... |
Arguments to be passed to methods, such as graphical parameters. |
Value
This function has no return value.
Plot Model Object
Description
Plots traces from the model object such as synthesis rates for each gene. Will work regardless of whether or not expression/synthesis rate levels are being estimated. If you wish to plot observed/empirical values, these values MUST be set using the initial.expression.values parameter found in initializeParameterObject. Otherwise, the expression values plotted will just be SCUO values estimated upon initialization of the Parameter object.
Usage
## S3 method for class 'Rcpp_ROCModel'
plot(x, genome = NULL, samples = 100, mixture = 1, simulated = FALSE, ...)
Arguments
x |
An Rcpp model object initialized with |
genome |
An Rcpp genome object initialized with |
samples |
The number of samples in the trace |
mixture |
The mixture for which to graph values. |
simulated |
A boolean value that determines whether to use the simulated genome. |
... |
Optional, additional arguments. For this function, a possible title for the plot in the form of a list if set with "main". |
Value
This function has no return value.
Plot Parameter
Description
plot
graphs the mutation or selection parameter for a ROC or FONSE
parameter object for each mixture element.
Usage
## S3 method for class 'Rcpp_ROCParameter'
plot(
x,
what = "Mutation",
samples = 100,
mixture.name = NULL,
with.ci = TRUE,
...
)
Arguments
x |
A parameter object |
what |
Which aspect of the parameter to plot. Default value is "Mutation". |
samples |
Number of samples to plot using the posterior mean. Default value is 100. |
mixture.name |
a vector with names/descriptions of the mixture distributions in the parameter object |
with.ci |
Plot with or without confidence intervals. Default value is TRUE |
... |
Arguments to be passed to methods, such as graphical parameters. |
Details
Graphs are based off the last # samples for the posterior mean.
Value
This function has no return value.
Plot Trace Object
Description
Plots different traces, specified with the what
parameter.
Usage
## S3 method for class 'Rcpp_Trace'
plot(
x,
what = c("Mutation", "Selection", "MixtureProbability", "Sphi", "Mphi", "Aphi",
"Sepsilon", "ExpectedPhi", "Expression", "NSEProb", "NSERate", "InitiationCost",
"PartitionFunction"),
geneIndex = 1,
mixture = 1,
log.10.scale = F,
...
)
Arguments
x |
An Rcpp trace object initialized with |
what |
A string containing one of the following to graph: |
geneIndex |
When plotting expression, the index of the gene to be plotted. |
mixture |
The mixture for which to plot values. |
log.10.scale |
A logical value determining if figures should be plotted on the log.10.scale (default=F). Should not be applied to mutation and selection parameters estimated by ROC/FONSE. |
... |
Optional, additional arguments. For this function, may be a logical value determining if the trace is ROC-based or not. |
Value
This function has no return value.
Plot Acceptance ratios
Description
Plots acceptance ratios for codon-specific parameters. Will be by amino acid for ROC and FONSE models, but will be by codon for PA and PANSE models. Note assumes estimating parameters for all codons.
Usage
plotAcceptanceRatios(trace, main = "CSP Acceptance Ratio Traces")
Arguments
trace |
An Rcpp trace object initialized with |
main |
The title of the plot. |
Value
This function has no return value.
Plot Codon Specific Parameter
Description
Plots a codon-specific set of traces, specified with the type
parameter.
Usage
plotCodonSpecificParameters(
trace,
mixture,
type = "Mutation",
main = "Mutation Parameter Traces",
ROC.or.FONSE = TRUE,
log.10.scale = F
)
Arguments
trace |
An Rcpp trace object initialized with |
mixture |
The mixture for which to plot values. |
type |
A string containing one of the following to graph: |
main |
The title of the plot. |
ROC.or.FONSE |
A logical value determining if the Parameter was ROC/FONSE or not. |
log.10.scale |
A logical value determining if figures should be plotted on the log.10.scale (default=F). Should not be applied to mutation and selection parameters estimated by ROC/FONSE. |
Value
This function has no return value.
readPhiValue
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Read synthesis rate values from file. File should be two column file <gene_id,phi> and is expected to have a header row
Arguments
filename |
name of file to be read |
Run MCMC
Description
runMCMC
will run a monte carlo markov chain algorithm
for the given mcmc, genome, and model objects to perform a model fitting.
Usage
runMCMC(mcmc, genome, model, ncores = 1, divergence.iteration = 0)
Arguments
mcmc |
MCMC object that will run the model fitting algorithm. |
genome |
Genome that the model fitting will run on. Should be the same genome associated with the parameter and model objects. |
model |
Model to run the fitting on. Should be associated with the given genome. |
ncores |
Number of cores to perform the model fitting with. Default value is 1. |
divergence.iteration |
Number of steps that the initial conditions can diverge from the original conditions given. Default value is 0. |
Details
runMCMC
will run for the number of samples times the number
thinning given when the mcmc object is initialized. Updates are provided every 100
steps, and the state of the chain is saved every thinning steps.
Value
This function has no return value.
Examples
#fitting a model to a genome using the runMCMC function
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
model <- initializeModelObject(parameter = parameter, model = "ROC")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
## End(Not run)
setAdaptiveWidth
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set sample adaptiveWidth value, which is the number of samples (not iterations) between adapting parameter proposal widths
Arguments
_adaptiveWidth |
postive value |
setGroupList
Description
Method of Parameter class (access via parameter$<function name>, where parameter is an object initialized by initializeParameterObject). Set amino acids (ROC, FONSE) or codons (PA, PANSE) for which parameters will be estimated. Note that non-default groupLists are still in beta testing and should be used with caution.
Arguments
List |
of strings epresenting groups for parameters to be estimated. Should be one letter amino acid (ROC, FONSE) or list of sense codons (PA, PANSE). |
setRestartFileSettings
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set restart file output name and frequency prior to running MCMC
Arguments
filename |
name of restart file |
interval |
number of samples (ie. iterations * thinning) between writing new restart file |
multiple |
if true, will output a new restart file at each interval (file name will include sample it was written at) |
Set Restart Settings
Description
setRestartSettings
sets the needed information (what the file
is called, how often the file should be written) to write
information to restart the MCMC algorithm from a given point.
Usage
setRestartSettings(mcmc, filename, samples, write.multiple = TRUE)
Arguments
mcmc |
MCMC object that will run the model fitting algorithm. |
filename |
Filename for the restart files to be written. |
samples |
Number of samples that should occur before a file is written. |
write.multiple |
Boolean that determines if multiple restart files are written. Default value is TRUE. |
Details
setRestartSettings
writes a restart file every set amount of samples
that occur. Also, if write.multiple is true, instead of overwriting the previous restart
file, the sample number is prepended onto the file name and multiple rerstart files
are generated for a run.
Value
This function has no return value.
Examples
## set restart settings for checkpointing
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
## estimate all parameter types
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
# prompts the mcmc to write a restart file every 100 samples during the run.
setRestartSettings(mcmc = mcmc, filename = "test_restart", samples = 100)
# prompts the mcmc to write a restart file every 100 samples during the run,
# but will overwrite it each time.
setRestartSettings(mcmc = mcmc, filename = "test_restart", samples = 100,
write.multiple = FALSE)
setSamples
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set number of samples set for MCMCAlgorithm object
Arguments
_samples |
postive value |
setStepsToAdapt
Description
Method of MCMC class (access via mcmc$<function name>, where mcmc is an object initialized by initializeMCMCObject). Set number of iterations (total iterations = samples * thinning) to allow proposal widths to adapt
Arguments
steps |
a postive value |
setThinning
Description
Set thinning value, which is the number of iterations (total iterations = samples * thinning) not being kept
Arguments
_thinning |
postive value |
simulateGenome
Description
Method of Model class (access via model$<function name>, where model is an object initialized by initializeModelObject). Will simulate a version of the given genome using the current set of parameters stored in the Parameter object. This can be written to a FASTA file using genome$writeFasta(<filename>,simulated = TRUE).
Arguments
genome |
a Genome object initialized by initializeGenomeObject |
Summary of Genome
Description
summary
summarizes the description of a genome, such as number of genes and average gene length.
Usage
## S3 method for class 'Rcpp_Genome'
summary(object, ...)
Arguments
object |
A genome object initialized with |
... |
Optional, additional arguments to be passed to the main summary function that affect the summary produced. |
Value
This function returns by default an object of class c("summaryDefault", table").
Write MCMC Object
Description
writeMCMCObject
stores the MCMC information from the
model fitting run in a file.
Usage
writeMCMCObject(mcmc, file)
Arguments
mcmc |
MCMC object that has run the model fitting algorithm. |
file |
A filename where the data will be stored. |
Value
This function has no return value.
Examples
## saving the MCMC object after model fitting
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
samples <- 2500
thinning <- 50
adaptiveWidth <- 25
mcmc <- initializeMCMCObject(samples = samples, thinning = thinning,
adaptive.width=adaptiveWidth, est.expression=TRUE,
est.csp=TRUE, est.hyper=TRUE, est.mix = TRUE)
divergence.iteration <- 10
## Not run:
runMCMC(mcmc = mcmc, genome = genome, model = model,
ncores = 4, divergence.iteration = divergence.iteration)
writeMCMCObject(mcmc = mcmc, file = file.path(tempdir(), "file.Rda"))
## End(Not run)
Write Parameter Object to a File
Description
writeParameterObject
will write the parameter object as binary to the filesystem
Usage
writeParameterObject(parameter, file)
Arguments
parameter |
parameter on object created by |
file |
A filename that where the data will be stored. |
Details
As Rcpp object are not serializable with the default R save
function,
therefore this custom save function is provided (see loadParameterObject).
Value
This function has no return value.
Examples
## Not run:
genome_file <- system.file("extdata", "genome.fasta", package = "AnaCoDa")
genome <- initializeGenomeObject(file = genome_file)
sphi_init <- c(1,1)
numMixtures <- 2
geneAssignment <- c(rep(1,floor(length(genome)/2)),rep(2,ceiling(length(genome)/2)))
parameter <- initializeParameterObject(genome = genome, sphi = sphi_init,
num.mixtures = numMixtures,
gene.assignment = geneAssignment,
mixture.definition = "allUnique")
## writing an empty parameter object as the runMCMC routine was not called yet
writeParameterObject(parameter = parameter, file = file.path(tempdir(), "file.Rda"))
## End(Not run)