Title: | BRACoD: Bayesian Regression Analysis of Compositional Data |
Version: | 0.0.2.0 |
Description: | The goal of this method is to identify associations between bacteria and an environmental variable in 16S or other compositional data. The environmental variable is any variable which is measure for each microbiome sample, for example, a butyrate measurement paired with every sample in the data. Microbiome data is compositional, meaning that the total abundance of each sample sums to 1, and this introduces severe statistical distortions. This method takes a Bayesian approach to correcting for these statistical distortions, in which the total abundance is treated as an unknown variable. This package runs the python implementation using reticulate. |
Imports: | reticulate |
Config/reticulate: | list( packages = list( list(package = "BRACoD") ) ) |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1.9001 |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2022-03-17 14:28:55 UTC; averster |
Author: | Adrian Verster [aut, cre] |
Maintainer: | Adrian Verster <adrian.verster@hc-sc.gc.ca> |
Depends: | R (≥ 3.5.0) |
Repository: | CRAN |
Date/Publication: | 2022-03-24 15:10:07 UTC |
Perform convergence tests on the p and beta variables
Description
You may get errors are divergence of some variables after pymc3 samples the posterior. We are not overly concerned about some of the variables, such as the variance, rather we are really interested in the inclusion probabilities (p) and contribution coefficients (beta). The convergence tests that are included here focus on evaluating those two variables.
Usage
convergence_tests(trace, df_relab)
Arguments
trace |
the output of run_bracod() |
df_relab |
the microbiome relative abundance |
Value
no return value
Install BRACoD in python
Description
Uses pip to install the latest BRACoD release in python. You might need to specify a python environment with either reticulate::use_virtualenv or reticulate::use_condaenv.
Usage
install_bracod(method = "auto", conda = "auto")
Arguments
method |
passed to reticulate::py_install |
conda |
passed to reticulate::py_install |
Value
no return value
Example microbiome data
Description
This data is mouse stool microbiome data from a study of obesity.
Usage
data(obesity)
df_scfa
Format
a DataFrame of 16S microbiome counts, and a dataframe with corresponding butyrate measurements
An object of class data.frame
with 119 rows and 1 columns.
Remove NULL values in your OTU and environmental variable
Description
This will remove samples that are NULL in the environmental variable, as well as the corresponding samples in your relative abundance data.
Usage
remove_null(df_relab, Y)
Arguments
df_relab |
microbiome relative abundance data in a dataframe |
Y |
values of the environmental variable |
Value
a list containing 1) the relative abundance data and 2) the Y values
Run the main BRACoD algorithm
Description
Uses pymc3 to sample the posterior of the model to determine bacteria that are associated with your environmental variable.
Usage
run_bracod(df_relab, env_var, n_sample = 1000, n_burn = 1000, njobs = 4)
Arguments
df_relab |
A dataframe of relative microbiome abundances. Samples are rows and bacteria are columns. |
env_var |
the environmental variable you are evaluating. You need 1 measurement associated with each sample. |
n_sample |
number of posterior samples. |
n_burn |
number of burn-in steps before actual sampling stops. |
njobs |
number of parallel MCMC chains to run. |
Value
the pymc trace object which holds the samples of the posterior distribution
Examples
## Not run:
data(obesity)
r <- simulate_microbiome_counts(obesity)
sim_counts <- r[[1]]
sim_y <- r[[2]]
contributions <- r[[3]]
sim_relab <- scale_counts(sim_counts)
trace <- run_bracod(sim_relab, sim_y, n_sample = 1000, n_burn=1000, njobs=4)
## End(Not run)
Normalize OTU counts and add a pseudo count
Description
BRACoD requires relative abundance and cannot handle zeros, so this function adds a small pseudo count (1/10th the smallest non-zero value).
Usage
scale_counts(df_counts)
Arguments
df_counts |
A dataframe of OTU counts. Samples are rows and bacteria are columns. |
Value
a dataframe of relative abundance data
Score the results of BRACoD
Description
This calculate the precision, recall and F1 of your BRACoD results if you know the ground truth, ie. if this is simulated data.
Usage
score(taxon_identified, taxon_actual)
Arguments
taxon_identified |
a list of integers corresponding to the indicies of the taxon you identified with BRACoD |
taxon_actual |
a list of integers corresponding to the indicies of the taxon that truely contribute to butyrate levels |
Value
a list containing 1) the precision 2) the recall 3) the f1 metric
Examples
## Not run:
df_summary <- summarize_trace(trace, colnames(sim_relab))
taxon_identified <- df_summary$taxon
taxon_actual <- which(contributions != 0)
r <- score(taxon_identified, taxon_actual)
precision <- r[[1]]
recall <- r[[2]]
f1 <- r[[3]]
print(sprintf("Precision: %.2f, Recall: %.2f, F1: %.2f",precision, recall, f1))
## End(Not run)
Simulate microbiome counts
Description
Each bacteria's absolute abundance is simulated from a lognormal distribution. Then, convert each sample to relative abundance, and simulate sequencing counts using a multinomial distribution, based on the desired number of reads and the simulated relative abundances. This also simulates an environmental variable that is produced by some of the bacteria.
Usage
simulate_microbiome_counts(
df,
n_contributors = 20,
coeff_contributor = 0,
min_ab_contributor = -9,
sd_Y = 1,
n_reads = 1e+05,
var_contributor = 5,
use_uniform = TRUE,
n_samples_use = NULL,
corr_value = NULL,
return_absolute = FALSE,
seed = NULL
)
Arguments
df |
A dataframe of OTU counts that is a model for data simulation. Samples are rows and bacteria are columns. |
n_contributors |
the number of bacteria that are to contribute to your environmental variable. |
coeff_contributor |
the average of the distribution used to simulate the contribution coefficient. |
min_ab_contributor |
The minimum log relative abundance, averaged across samples, to include a bacteria |
sd_Y |
the standard deviation of the simulated environmental variable |
n_reads |
the number of reads to be simulated per sample |
var_contributor |
If you use a uniform distribution, this is the range of the distribution, with a normal distribution it is the variance used to simulate the contribution coefficient. |
use_uniform |
use a uniform distribution to simulate the contribution coefficient. Alternative is the normal distribution. |
n_samples_use |
number of microbiome samples to simulate. If NULL, uses the same number of samples as in your dataframe |
corr_value |
the bacteria-bacteria correlation value you want to include in the simulation |
return_absolute |
returns the abosulte abundance values instead of the simulated microbiome counts |
seed |
random seed for reproducibility |
Value
a list containing 1) the simulated count data 2) the simulated environmental variable and 3) the simulated contribution coefficients
Summarize the results of BRACoD
Description
This summarizes the trace object that run_bracod() returns. It returns a dataframe that contains two parameters of interest, the average inclusion (p) and the average coefficient (beta), telling you the association between that bacteria and the environmental variable
Usage
summarize_trace(trace, taxon_names = NULL, cutoff = 0.3)
Arguments
trace |
the pymc3 object that is the output of run_bracod() |
taxon_names |
optional, a list of names of the bacteria to include in the results |
cutoff |
this is the cutoff on the average inclusion for inclusion. We reccomend a value of 0.3, but you can lower the value to include less confident taxon or raise the cutoff to exclude them. |
Value
a dataframe with information about the bacteria that BRACoD identified
Examples
## Not run:
trace <- run_bracod(sim_relab, sim_y, n_sample = 1000, n_burn=1000, njobs=4)
df_summary <- summarize_trace(trace, colnames(sim_relab))
## End(Not run)
Threshold your microbiome counts data
Description
This function removes samples below a minimum counts and bacteria below a minimum log abundance. Run this before running BRACoD because the algorithm does not perform well when there are many low abundance bacteria that are only present in a few samples.
Usage
threshold_count_data(df_counts, min_counts = 1000, min_ab = 1e-04)
Arguments
df_counts |
A dataframe of OTU counts. Samples are rows and bacteria are columns. |
min_counts |
threshold samples with fewer than this many counts |
min_ab |
threshold bacteria whose average log abundance is below this |
Value
a dataframe of microbiome counts