% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/prepare_diversification_data.R
\name{prepare_diversification_data}
\alias{prepare_diversification_data}
\title{Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow}
\usage{
prepare_diversification_data(
  BAMM_install_directory_path,
  phylo,
  prefix_for_files = NULL,
  seed = NULL,
  numberOfGenerations = 10^7,
  globalSamplingFraction = 1,
  sampleProbsFilename = NULL,
  expectedNumberOfShifts = NULL,
  eventDataWriteFreq = NULL,
  burn_in = 0.25,
  nb_posterior_samples = 1000,
  additional_BAMM_settings = list(),
  BAMM_output_directory_path = NULL,
  keep_BAMM_outputs = TRUE,
  MAP_odd_ratio_threshold = 5,
  skip_evaluations = FALSE,
  plot_evaluations = TRUE,
  save_evaluations = TRUE
)
}
\arguments{
\item{BAMM_install_directory_path}{Character string. The path to the directory where BAMM is.
Use '/' to separate directory and sub-directories. The path must end with '/'.}

\item{phylo}{Time-calibrated phylogeny. Object of class \code{"phylo"} as defined in R package \code{{ape}}. The phylogeny must be rooted and fully resolved.
BAMM does not currently work with fossils, so the tree must also be ultrametric.}

\item{prefix_for_files}{Character string. Prefix to add to all BAMM files stored in the \code{BAMM_output_directory_path} if \code{keep_BAMM_outputs = TRUE}.
Files will be exported such as 'prefix_*' with an underscore separating the prefix and the file name. Default is \code{NULL} (no prefix is added).}

\item{seed}{Integer. Set the seed to ensure reproducibility. Default is \code{NULL} (a random seed is used).}

\item{numberOfGenerations}{Integer. Number of steps in the MCMC run. It should be set high enough to reach the equilibrium distribution
and allows posterior samples to be uncorrelated. Check the Effective Sample Size of parameters with coda::effectiveSize() in the Evaluation step.
Default value is \code{10^7}.}

\item{globalSamplingFraction}{Numerical. Global sampling fraction representing the overall proportion of terminals in the phylogeny compared to
the estimated overall richness in the clade. It acts as a multiplier on the rates needed to achieve such extant diversity.
Default is \code{1.0} (assuming all taxa are in the phylogeny).}

\item{sampleProbsFilename}{Character string. The path to the \code{.txt} file used to provide clade-specific sampling fractions.
See \code{\link[BAMMtools:samplingProbs]{BAMMtools::samplingProbs()}} to generate such file. If provided, \code{globalSamplingFraction} is ignored.}

\item{expectedNumberOfShifts}{Integer. Set the expected number of regime shifts. It acts as an hyperparameter controlling the exponential prior distribution
used to modulate reversible jumps across model configurations in the rjMCMC run.
If set to \code{NULL} (default), an empirical rule will be used to define this value: 1 regime shift expected for every 100 tips in the phylogeny, with a minimum of 1.
The best practice consists in trying several values and inspect the similarity of the prior and posterior distribution of the regime shift parameter.
See \code{\link[BAMMtools:plotPrior]{BAMMtools::plotPrior()}} and the Evaluation step to produce such evaluation plot.}

\item{eventDataWriteFreq}{Integer. Set the frequency in which to write the event data to the output file = the sampling frequency of posterior samples.
If set to \code{NULL} (default), will set frequency such as 2000 posterior samples are recorded such as \code{eventDataWriteFreq = numberOfGenerations / 2000}.}

\item{burn_in}{Numerical. Proportion of posterior samples removed from the BAMM output to ensure that the remaining samples where drawn once the equilibrium distribution was reached.
This can be evaluated looking at the MCMC trace (see Evaluation step). Default is \code{0.25}.}

\item{nb_posterior_samples}{Numerical. Number of posterior samples to extract, after removing the burn-in, in the final \code{BAMM_object} to use for downstream analyses.
Default = \code{1000}.}

\item{additional_BAMM_settings}{List of named elements. Additional settings options for BAMM provided as a list of named arguments.
Ex: \code{list(lambdaInit0 = 0.5, muInit0 = 0)}. See available settings in the template file provided within the deepSTRAPP package files as 'BAMM_template_diversification.txt'.
The template can also be loaded directly in R with \code{utils::data(BAMM_template_diversification)} and displayed with \code{print(BAMM_template_diversification)}.}

\item{BAMM_output_directory_path}{Character string. The path to the directory used to store input/output files generated.
Use '/' to separate directory and subdirectories. It must end with '/'.}

\item{keep_BAMM_outputs}{Logical. Whether the \code{BAMM_output_directory} should be kept after the run. Default = \code{TRUE}.}

\item{MAP_odd_ratio_threshold}{Numerical. Controls the definition of 'core-shifts' used to distinguish across configurations when fetching the MAP samples.
Shifts that have an odd-ratio of marginal posterior probability / prior lower than \code{MAP_odd_ratio_threshold} are ignored. See \code{\link[BAMMtools:getBestShiftConfiguration]{BAMMtools::getBestShiftConfiguration()}}. Default = \code{5}.}

\item{skip_evaluations}{Logical. Whether to skip the Evaluation step including MCMC trace, ESS, and prior/posterior comparisons for expected number of shifts. Default = \code{FALSE}.}

\item{plot_evaluations}{Logical. Whether to display the plots generated during the Evaluation step: MCMC trace, and prior/posterior comparisons for expected number of shifts. Default = \code{TRUE}.}

\item{save_evaluations}{Logical. Whether to save the outputs of evaluations in a table (ESS), and PDFs (MCMC trace, and prior/posterior comparisons for expected number of shifts)
in the \code{BAMM_output_directory}. Default = \code{TRUE}.}
}
\value{
The function returns a \code{BAMM_object} of class \code{"bammdata"} which is a list with at least 22 elements.

Phylogeny-related elements used to plot a phylogeny with \code{\link[ape:plot.phylo]{ape::plot.phylo()}}:
\itemize{
\item \verb{$edge} Matrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge.
\item \verb{$Nnode} Integer. Number of internal nodes.
\item \verb{$tip.label} Vector of character strings. Labels of all tips.
\item \verb{$edge.length} Vector of numerical. Length of edges/branches.
\item \verb{$node.label} Vector of character strings. Labels of all internal nodes. (Present only if present in the initial \code{BAMM_object})
}

BAMM internal elements used for tree exploration:
\itemize{
\item \verb{$begin} Vector of numerical. Absolute time since root of edge/branch start (rootward).
\item \verb{$end} Vector of numerical.  Absolute time since root of edge/branch end (tipward).
\item \verb{$downseq} Vector of integers. Order of node visits when using a pre-order tree traversal.
\item \verb{$lastvisit} ID of the last node visited when starting from the node in the corresponding position in \verb{$downseq}.
}

BAMM elements summarizing diversification data:
\itemize{
\item \verb{$numberEvents} Vector of integer. Number of events/macroevolutionary regimes (k+1) recorded in each posterior configuration. k = number of shifts.
\item \verb{$eventData} List of data.frames. One per posterior sample. Records shift events and macroevolutionary regimes parameters. 1st line = Background root regime.
\item \verb{$eventVectors} List of integer vectors. One per posterior sample. Record regime ID per branches.
\item \verb{$tipStates} List of named integer vectors. One per posterior sample. Record regime ID per tips.
\item \verb{$tipLambda} List of named numerical vectors. One per posterior sample. Record speciation rates per tips.
\item \verb{$tipMu} List of named numerical vectors. One per posterior sample. Record extinction rates per tips.
\item \verb{$eventBranchSegs} List of matrix of numerical. One per posterior sample. Record regime ID per segments of branches.
\item \verb{$meanTipLambda} Vector of named numerical. Mean tip speciation rates across all posterior configurations of tips.
\item \verb{$meanTipMu} Vector of named numerical. Mean tip extinction rates across all posterior configurations of tips.
\item \verb{$type} Character string. Set the type of data modeled with BAMM. Should be "diversification".
}

Additional elements providing key information for downstream analyses:
\itemize{
\item \verb{$expectedNumberOfShifts} Integer. The expected number of regime shifts used to set the prior in BAMM.
\item \verb{$MSP_tree} Object of class \code{phylo}. List of 4 elements duplicating information from the Phylogeny-related elements above,
except \verb{$MSP_tree$edge.length} is recording the Marginal Shift Probability of each branch (i.e., the probability of a regime shift to occur along each branch)
\item \verb{$MAP_indices} Vector of integers. The indices of the Maximum A Posteriori probability (MAP) configurations among the posterior samples.
\item \verb{$MAP_BAMM_object}. List of 18 elements of class `"bammdata" recording the mean rates and regime shift locations found across
the Maximum A Posteriori probability (MAP) configurations. All BAMM elements summarizing diversification data holds a single entry describing
this mean diversification history.
\item \verb{$MSC_indices} Vector of integers. The indices of the Maximum Shift Credibility (MSC) configurations among the posterior samples.
\item \verb{$MSC_BAMM_object} List of 18 elements of class `"bammdata" recording the mean rates and regime shift locations found across
the Maximum Shift Credibility (MSC) configurations. All BAMM elements summarizing diversification data holds a single entry describing
this mean diversification history.
}

The function also produces files listed in the Details section and stored in the the \code{BAMM_output_directory}.
}
\description{
Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
to produce a \code{BAMM_object} that contains a phylogenetic tree and associated diversification rates
mapped along branches, across selected posterior samples:
\itemize{
\item Step 1: Set BAMM - Record BAMM settings and generate all input files needed for BAMM.
\item Step 2: Run BAMM - Run BAMM and move output files in dedicated directory.
\item Step 3: Evaluate BAMM - Produce evaluation plots and ESS data.
\item Step 4: Import BAMM outputs - Load \code{BAMM_object} in R and subset posterior samples.
\item Step 5: Clean BAMM files - Remove files generated during the BAMM run.
}

The \code{BAMM_object} output is typically used as input to run deepSTRAPP with \code{\link[=run_deepSTRAPP_for_focal_time]{run_deepSTRAPP_for_focal_time()}}
or \code{\link[=run_deepSTRAPP_over_time]{run_deepSTRAPP_over_time()}}. Diversification rates and regimes shift can be visualized with \code{\link[=plot_BAMM_rates]{plot_BAMM_rates()}}.

BAMM is a model of diversification for time-calibrated phylogenies that explores complex diversification dynamics
by allowing multiple regime shifts across clades without a priori hypotheses on the location of such shifts.
It uses reversible jump Markov chain Monte Carlo (rjMCMC) to automatically explore a vast range of models with different
speciation and extinction rates, and different number and location of regime shits.

This function will work only if you have the BAMM C++ program installed in your machine.
See the BAMM website: \url{http://bamm-project.org/} and the companion R package \code{{BAMMtools}}.
}
\details{
This function runs a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
to produce a \code{BAMM_object} that contains a phylogenetic tree and associated diversification rates
mapped along branches, across selected posterior samples.

Step 1: Set BAMM
\itemize{
\item Produces a tree file for the phylogeny. Default file: 'phylogeny.tree'.
\item Save configuration settings used for the BAMM run. Default file: 'config_file.txt'.
\item Save default priors generated by \link[BAMMtools:setBAMMpriors]{BAMMtools::setBAMMpriors} based on the phylogeny. Default file: 'priors.txt'.
}

Step 2: Run BAMM
\itemize{
\item Run BAMM using the system console
\item Move output files in dedicated \code{BAMM_output_directory}. Default directory is \verb{./BAMM_outputs/}.
\itemize{
\item 'run_info.txt' containing a summary of your parameters/settings.
\item 'mcmc_log.txt' containing raw MCMC information useful in diagnosing convergence.
\item 'event_data.txt' containing all evolutionary rate parameters and their topological mappings.
\item 'chain_swap.txt' containing data about each chain swap proposal (when a proposal occurred, which chains might be swapped, and whether the swap was accepted).
\item 'acceptance_info.txt' containing the history of acceptance/proposal of MCMC steps (If additional setting \code{outputAcceptanceInfo} is set to 1).
}
}

Step 3: Evaluate BAMM
\itemize{
\item Plot the MCMC trace = evolution of logLik across MCMC generations. Output file = 'MCMC_trace_logLik.pdf'.
\item Compute the Effective Sample Size (ESS) across posterior samples (after removing burn-in) using \code{\link[coda:effectiveSize]{coda::effectiveSize()}}.
This is a way to evaluate if your MCMC runs has enough generations to produce robust estimates. Ideally, ESS should be higher than 200.
Output file = 'ESS_df.csv'.
\item Plot the comparison of prior and posterior distributions of the number of regime shifts with \link[BAMMtools:plotPrior]{BAMMtools::plotPrior}.
Output file = 'PP_nb_shifts_plot.pdf'.
A good value for \code{expectedNumberOfShifts} is one with high similarities between the distributions
hinting that the information in the data coincides with your expectations for the number of regime shifts.
The best practice consists in trying several values to control if it affects or not the final output.
}

Step 4: Import BAMM outputs
\itemize{
\item Load BAMM outputs with \link[BAMMtools:getEventData]{BAMMtools::getEventData}.
\item Subset posterior samples to the requested \code{nb_posterior_samples} with \link[BAMMtools:subsetEventData]{BAMMtools::subsetEventData}.
\item Record the \verb{$expectedNumberOfShifts} used to set the prior. This is useful for downstream analyses involving comparison of prior vs. posterior probabilities
(See \code{\link[BAMMtools:distinctShiftConfigurations]{BAMMtools::distinctShiftConfigurations()}}).
\item Record the marginal posterior probability of regime shift along branches based on the proportion of samples harboring a regime shift along each branch.
(See \code{\link[BAMMtools:ShiftProbsTree]{BAMMtools::marginalShiftProbsTree()}}). Result is stored in \verb{$MSP_tree} as phylogenetic tree with \verb{$edge.length} scaled to the marginal posterior probability.
\item Extract the Maximum A Posteriori probability (MAP) configuration = the configuration of regime shift location found the most frequently among the posterior samples.
(See \code{\link[BAMMtools:getBestShiftConfiguration]{BAMMtools::getBestShiftConfiguration()}}). This ignores shifts that have an odd-ratio of marginal posterior probability / prior lower than \code{MAP_odd_ratio_threshold}
to avoid noise from non-core shifts. MAP sample indices are stored in \verb{$MAP_indices}. Diversification rates and shift locations on branches are then averaged across all MAP samples and
recorded as an object of class \code{"bammdata"} in \verb{$MAP_BAMM_object} with a single \verb{$eventData} table used to plot regime shifts on the phylogeny with \code{\link[=plot_BAMM_rates]{plot_BAMM_rates()}}.
\item Extract the Maximum Shift Credibility (MSC) configuration = the configuration of regime shift location with the highest product of marginal probabilities across branches.
(See \code{\link[BAMMtools:maximumShiftCredibility]{BAMMtools::maximumShiftCredibility()}}). MSC sample indices are stored in \verb{$MSC_indices}. Diversification rates and shift locations on branches are then averaged across all MSC samples and
recorded as an object of class \code{"bammdata"} in \verb{$MSC_BAMM_object} with a single \verb{$eventData} table used to plot regime shifts on the phylogeny with \code{\link[=plot_BAMM_rates]{plot_BAMM_rates()}}.
}

Step 5: Clean BAMM files
\itemize{
\item Remove files generated in Steps 1 & 2 if \code{keep_BAMM_outputs = FALSE}.
\item Delete the \code{BAMM_output_directory} if empty after cleaning files.
}

The \code{BAMM_object} output:
\itemize{
\item is typically used as input to run deepSTRAPP with \code{\link[=run_deepSTRAPP_for_focal_time]{run_deepSTRAPP_for_focal_time()}} or \code{\link[=run_deepSTRAPP_over_time]{run_deepSTRAPP_over_time()}}.
\item can be used to extract rates and regimes for any \code{focal_time} in the past with \code{\link[=update_rates_and_regimes_for_focal_time]{update_rates_and_regimes_for_focal_time()}}.
\item can be used to map diversification rates and regime shifts on the phylogeny with \code{\link[=plot_BAMM_rates]{plot_BAMM_rates()}}.
}
}
\section{Note on diversification models for time-calibrated phylogenies}{
This function relies on BAMM to provide a reliable solution to map diversification rates and regime shifts on a time-calibrated phylogeny
and obtain the \code{BAMM_object} object needed to run the deepSTRAPP workflow (\link{run_deepSTRAPP_for_focal_time}, \link{run_deepSTRAPP_over_time}).
However, it is one option among others for modeling diversification on phylogenies.
You may wish to explore alternatives models such as LSBDS model in RevBayes (Höhna et al., 2016), the MTBD model (Barido-Sottani et al., 2020),
or the ClaDS2 model (Maliet et al., 2019) for your own data.
However, you will need Bayesian models that infer regime shifts to be able to perform STRAPP tests (Rabosky & Huang, 2016).
Additionally, you need to format the model output such as in \code{BAMM_object}, so it can be used in a deepSTRAPP workflow.

This function perform a single BAMM run to infer diversification rates and regime shifts.
Due to the stochastic nature of the exploration of the parameter space with MCMC process,
best practice recommend to ran multiple runs and check for convergence of the MCMC traces,
ensuring that the region of high probability has been reached by your MCMC runs.
}

\examples{
# ----- Example 1: Whale phylogeny ----- #

library(phytools)
data(whale.tree)

\dontrun{
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (\url{http://bamm-project.org/}) for information.

# Run BAMM workflow with deepSTRAPP
whale_BAMM_object <- prepare_diversification_data(
   BAMM_install_directory_path = "./software/bamm-2.5.0/",
   phylo = whale.tree,
   prefix_for_files = "whale",
   numberOfGenerations = 100000, # Set low for the example
   BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
   keep_BAMM_outputs = FALSE, # Adjust if needed
)}

# Load directly the result
data(whale_BAMM_object)

# Explore output
str(whale_BAMM_object, 1)

# Plot mean net diversification rates and regime shifts on the phylogeny
plot_BAMM_rates(whale_BAMM_object, cex = 0.5,
                labels = TRUE, legend = TRUE)

# ----- Example 2: Ponerinae phylogeny ----- #

#   Load phylogeny
data("Ponerinae_tree", package = "deepSTRAPP")
plot(Ponerinae_tree, show.tip.label = FALSE)

\dontrun{
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (http://bamm-project.org/) for information.

# Run BAMM workflow with deepSTRAPP
Ponerinae_BAMM_object <- prepare_diversification_data(
   BAMM_install_directory_path = "./software/bamm-2.5.0/",
   phylo = Ponerinae_tree,
   prefix_for_files = "Ponerinae",
   numberOfGenerations = 10^7, # Set high for optimal run, but will take ages
   BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
   keep_BAMM_outputs = FALSE, # Adjust if needed
)}

if (deepSTRAPP::is_dev_version())
{
 # Load directly the result
 data(Ponerinae_BAMM_object)
 ## This dataset is only available in development versions installed from GitHub.
 # It is not available in CRAN versions.
 # Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.

 # Explore output
 str(Ponerinae_BAMM_object, 1)

 # Plot mean net diversification rates and regime shifts on the phylogeny
 plot_BAMM_rates(Ponerinae_BAMM_object,
                 labels = FALSE, legend = TRUE)
}

}
\references{
For BAMM: Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees.
PloS one, 9(2), e89543. \doi{10.1371/journal.pone.0089543}. Website: \url{http://bamm-project.org/}.

For \code{{BAMMtools}}: Rabosky, D. L., Grundler, M., Anderson, C., Title, P., Shi, J. J., Brown, J. W., ... & Larson, J. G. (2014).
BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5(7), 701-707.
\doi{10.1111/2041-210X.12199}
}
\seealso{
\code{\link[=run_deepSTRAPP_for_focal_time]{run_deepSTRAPP_for_focal_time()}} \code{\link[=run_deepSTRAPP_over_time]{run_deepSTRAPP_over_time()}} \code{\link[=update_rates_and_regimes_for_focal_time]{update_rates_and_regimes_for_focal_time()}} \code{\link[=prepare_trait_data]{prepare_trait_data()}} \code{\link[=plot_BAMM_rates]{plot_BAMM_rates()}}

For a guided tutorial, see this vignette: \code{vignette("model_diversification_dynamics", package = "deepSTRAPP")}
}
\author{
Maël Doré
}
