Type: | Package |
Title: | Dependent Gaussian Processes for Longitudinal Correlated Factors |
Version: | 1.0.0.1 |
Maintainer: | Jiachen Cai <jiachen.cai@mrc-bsu.cam.ac.uk> |
Description: | Functionalities for analyzing high-dimensional and longitudinal biomarker data to facilitate precision medicine, using a joint model of Bayesian sparse factor analysis and dependent Gaussian processes. This paper illustrates the method in detail: J Cai, RJB Goudie, C Starr, BDM Tom (2023) <doi:10.48550/arXiv.2307.02781>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
Imports: | GPFDA, Rcpp, factor.switching, mvtnorm, combinat, coda, corrplot, pheatmap, stats |
LinkingTo: | Rcpp, RcppArmadillo |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Depends: | R (≥ 2.10) |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2025-03-08 07:11:11 UTC; hornik |
Author: | Jiachen Cai [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-03-08 08:58:39 UTC |
DGP4LCF: Dependent Gaussian Processes for Longitudinal Correlated Factors
Description
This package implements a novel methodology to model high-dimensional gene expression trajectories.
Author(s)
Maintainer:
Other contributors:
Jiachen Cai jiachen.cai@mrc-bsu.cam.ac.uk
Displaying significant factor loadings in the heatmap.
Description
This function is used to visualize results of estimates of factor loadings (in heatmaps).
Usage
factor_loading_heatmap(factor_loading_matrix, heatmap_title)
Arguments
factor_loading_matrix |
A matrix of dimension (p, k), which stores results for factor loadings. |
heatmap_title |
A character. Title for the heatmap. |
Value
A heatmap presenting posterior median estimates of factor loadings.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Plotting figures for factor score trajectory.
Description
This function is used to visualize results of factor score trajectories.
Usage
factor_score_trajectory(
factor_score_matrix,
factor_index,
person_index,
trajectory_title,
cex_main = 1
)
Arguments
factor_score_matrix |
A matrix of dimension (q, k, n), used to store results for factor scores. |
factor_index |
A numeric scalar. Index of the factor of interest. |
person_index |
A numeric scalar. Index of the person of interest. |
trajectory_title |
A character. Title for the factor trajectory plot. |
cex_main |
A numeric scalar. Text size of the title. |
Value
Trajectory of the designated person-factor.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Generating posterior samples for parameters (other than DGP parameters) in the model and predicted gene expression for one chain.
Description
Generating posterior samples for parameters (other than DGP parameters) in the model and predicted gene expression for one chain.
Usage
gibbs_after_mcem_algorithm(
chain_index,
mc_num,
burnin,
thin_step,
pathname,
pred_indicator = FALSE,
pred_time_index = NULL,
x,
mcem_parameter_setup_result,
mcem_algorithm_result,
gibbs_after_mcem_diff_initials_result
)
Arguments
chain_index |
A numeric scalar. Index of the chain. |
mc_num |
A numeric scalar. Number of iterations in the Gibbs sampler. |
burnin |
A numeric scalar. Number of iterations to be discarded as 'burn-in'. |
thin_step |
A numeric scalar. This function will only save every 'thin_step'th iteration results in the specified directory to reduce storage space needed. Note that this number can be different from that used in the function 'mcem_algorithm'. |
pathname |
A character. The directory where the saved Gibbs samplers are stored. |
pred_indicator |
A logical value. pred_indicator = TRUE denotes the need to predict gene expression at new time points. The default value is FALSE. |
pred_time_index |
Only needed if pred_indicator = TRUE. Index of the new time points in the full time vector. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expression observed at q_i time points for the ith subject. |
mcem_parameter_setup_result |
A list of objects returned from the function 'mcem_parameter_setup'. |
mcem_algorithm_result |
A list of objects returned from the function 'mcem_algorithm'. |
gibbs_after_mcem_diff_initials_result |
A list of objects returned from the function 'gibbs_after_mcem_diff_initials'. |
Details
This function corresponds to Algorithm 2: Step 1 in the main manuscript; therefore reader can consult the paper for more explanations.
Value
Posterior samples for parameters (other than DGP parameters) in the model and predicted gene expression for one chain.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Combining from all chains the posterior samples for parameters in the model and predicted gene expressions.
Description
Combining from all chains the posterior samples for parameters in the model and predicted gene expressions.
Usage
gibbs_after_mcem_combine_chains(tot_chain, gibbs_after_mcem_algorithm_result)
Arguments
tot_chain |
A numeric scalar. Total number of chains. |
gibbs_after_mcem_algorithm_result |
A list of objects storing model constants. Should be the same as that input to the 'function gibbs_after_mcem_load_chains'. |
Value
All saved posterior samples for parameters in the model and predicted gene expressions.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Generating different initials for multiple chains.
Description
Generating different initials for multiple chains.
Usage
gibbs_after_mcem_diff_initials(
ind_x = TRUE,
tot_chain = 5,
mcem_parameter_setup_result,
mcem_algorithm_result
)
Arguments
ind_x |
A logical value. ind_x = TRUE uses the model including the intercept term for subject-gene mean in after-MCEM-Gibbs sampler; otherwise uses the model without the intercept term. |
tot_chain |
A numeric scalar. Number of parallel chains. |
mcem_parameter_setup_result |
A list of objects returned from the function 'mcem_parameter_setup'. |
mcem_algorithm_result |
A list of objects returned from the function 'mcem_algorithm'. |
Value
Different initials for multiple chains.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Loading the saved posterior samples for parameters in the model and predicted gene expressions.
Description
Loading the saved posterior samples for parameters in the model and predicted gene expressions.
Usage
gibbs_after_mcem_load_chains(chain_index, gibbs_after_mcem_algorithm_result)
Arguments
chain_index |
A numeric scalar. Index of the chain. |
gibbs_after_mcem_algorithm_result |
A list of objects storing model constants. |
Value
All saved posterior samples for parameters in the model and predicted gene expressions.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Monte Carlo Expectation Maximization (MCEM) algorithm to return the Maximum Likelihood Estimate (MLE) of DGP Parameters.
Description
This function is used to return the MLE of DGP parameters.
Usage
mcem_algorithm(
ind_x,
ig_parameter = 10^-2,
increasing_rate = 0.5,
prob_conf_interval = 0.9,
iter_count_num = 5,
x,
mcem_parameter_setup_result,
ipt_x = FALSE,
missing_list = NULL,
missing_num = NULL
)
Arguments
ind_x |
A logical value. ind_x = TRUE uses the model including the intercept term for subject-gene mean in within-MCEM-Gibbs sampler; otherwise uses the model without the intercept term. |
ig_parameter |
A numeric scalar. Hyper-parameters for the prior Inverse-Gamma distribution. |
increasing_rate |
A numeric scalar. Rate of increasing the sample size. |
prob_conf_interval |
A numeric scalar. The probability that the true change in the Q-function is larger than the lower bound. |
iter_count_num |
A numeric scalar. Maximum number of increasing the sample size; a larger number than this would end the algorithm. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expression observed at q_i time points for the ith subject. |
mcem_parameter_setup_result |
A list of objects returned from the function 'mcem_parameter_setup'. |
ipt_x |
A logical value. ind_x = TRUE denotes the need to impute for NAs of gene expression. The default value is ind_x = FALSE. |
missing_list |
A list of n elements. Each element is a matrix of dimension (missing_num, 2): each row corresponds to the position of one NA that needs imputation; first and second columns denote the row and column indexes, respectively, of the NA in the corresponding person's matrix of gene expression. |
missing_num |
A vector of n elements. Each element corresponds to a single person's number of NAs that needs imputation. |
Value
The MLE of DGP parameters.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Visualizing cross-correlations among factors.
Description
Visualizing cross-correlations among factors.
Usage
mcem_cov_plot(k, q, cov_input, title)
Arguments
k |
A numeric scalar. Number of latent factors. |
q |
A numeric scalar. Number of time points in the covariance matrix of factors. |
cov_input |
A matrix of dimension (kq, kq). The covariance matrix of the vector obtained from vectorizing the matrix of latent factor scores. |
title |
A character. Title for the plot. |
Value
Visualization of cross-correlations among factors.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Parameters' setup and initial value assignment for the Monte Carlo Expectation Maximization (MCEM) algorithm.
Description
This function is used to create R objects storing parameters in the desired format, and assign initial values so that they are ready to use in the MCEM algorithm.
Usage
mcem_parameter_setup(
p,
k,
n,
q,
ind_num = 10,
burn_in_prop = 0.2,
thin_step = 5,
prior_sparsity = 0.1,
em_num = 50,
obs_time_num,
obs_time_index,
a_person,
col_person_index,
y_init,
a_init,
z_init,
phi_init,
a_full,
train_index,
x,
model_dgp = TRUE
)
Arguments
p |
A numeric scalar. Number of genes. |
k |
A numeric scalar. Number of latent factors. |
n |
A numeric scalar. Number of subjects. |
q |
A numeric scalar. Complete number of time points in the training data. |
ind_num |
A numeric scalar. Starting size of approximately independent samples for MCEM. |
burn_in_prop |
A numeric scalar. Proportion of burnin, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'. |
thin_step |
A numeric scalar. Thinning step, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'. |
prior_sparsity |
A numeric scalar. Prior expected proportion of genes involved within each pathway. |
em_num |
A numeric scalar. Maximum iterations of the expectation maximization (EM) algorithm allowed. |
obs_time_num |
A n-dimensional vector. One element represents one person's observed number of time points in the training data. |
obs_time_index |
A list of n elements. One element is a vector of observed time indexes for one person in the training data, sorted from early to late. |
a_person |
A list of n elements. One element is a vector of observed time for one subject in the training data, sorted from early to late. |
col_person_index |
A list of n elements. One element is a vector of column indexes for one subject in y_init. |
y_init |
A matrix of dimension (k, sum(obs_time_num)). Initial values of the latent factor score. Can be obtained using BFRM software. |
a_init |
A matrix of dimension (p, k). Initial values of the regression coefficients of factor loadings. Can be obtained using BFRM software. |
z_init |
A matrix of dimension (p, k). Initials values of the binary variables of factor loadings. Can be obtained using BFRM software. |
phi_init |
A p-dimensional column vector. Initials values of the variance for residuals when modeling gene expressions, corresponding to |
a_full |
A numeric vector. Complete time observed, sorted from early to late. |
train_index |
A q-dimensional column vector. Index of time points used in the training data. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expressions for the ith subject. |
model_dgp |
A logical value. model_dgp = TRUE (default setting) uses the Dependent Gaussian Process to model latent factor trajectories, otherwise the Independent Gaussian Process is used. |
Details
The following parameters are worth particular attention, and users should tune these parameters according to the specific data.
'burn_in_prop' and 'thin_step' co-control the number of Gibbs samples needed in order to generate approximately 'ind_num' independent samples. The ultimate purpose of tuning these two parameters is to generate high-quality posterior samples for latent factor scores. Therefore: if initials of the Gibbs sampler are not good, readers may need to increase 'burn_in_prop' to discard more burn-in samples; if high-correlation is a potential concern, 'thin_step' may need to be larger.
Value
A list of R objects required in the MCEM algorithm.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Numerical summary for important continuous variables that do not need alignment.
Description
Numerical summary for important continuous variables that do not need alignment.
Usage
numerics_summary_do_not_need_alignment(
burnin = 0,
thin_step = 1,
pred_x_truth_indicator = FALSE,
pred_x_truth = NULL,
gibbs_after_mcem_combine_chains_result
)
Arguments
burnin |
A numeric scalar. The saved samples are already after burnin; therefore the default value for this parameter here is 0. Can discard further samples if needed. |
thin_step |
A numeric scalar. The saved samples are already after thinning; therefore the default value for this parameter here is 1. Can be further thinned if needed. |
pred_x_truth_indicator |
A logical value. pred_x_truth_indicator = TRUE means that truth of predicted gene expressions are available. The default value is FALSE. |
pred_x_truth |
Only needed if pred_x_truth_inidcator = TRUE. An array of dimension (n, p, num_time_test), storing true gene expressions in the testing data. |
gibbs_after_mcem_combine_chains_result |
A list of objects returned from the function 'gibbs_after_mcem_combine_chains'. |
Details
This function corresponds to Algorithm 2: Steps 3 and 4 in the main manuscript; therefore reader can consult the paper for more explanations.
Value
Convergence assessment for important continuous variables that do not need alignment, and posterior summary for predicted gene expressions.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
Numerical summary for factor loadings and factor scores, which need alignment.
Description
Numerical summary for factor loadings and factor scores, which need alignment.
Usage
numerics_summary_need_alignment(
burnin = 0,
thin_step = 1,
gibbs_after_mcem_combine_chains_result
)
Arguments
burnin |
A numeric scalar. The saved samples are already after burnin; therefore the default value for this parameter here is 0. Can discard further samples if needed. |
thin_step |
A numeric scalar. The saved samples are already after thinning; therefore the default value for this parameter here is 1. Can be further thinned if needed. |
gibbs_after_mcem_combine_chains_result |
A list of objects returned from the function 'gibbs_after_mcem_combine_chains'. |
Details
This function corresponds to Algorithm 2: Steps 2, 3 and 4 in the main manuscript; therefore reader can consult the paper for more explanations.
Value
Reordered posterior samples, convergence assessment, and summarized posterior results for factor loadings and factor scores.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Initials values.
Description
Initial values provided by the two-step approach.
Usage
sim_fcs_init
Format
An object of class list
of length 14.
Results when people have irregularly observed time points (some 6 while others 8).
Description
Results when people have irregularly observed time points (some 6 while others 8).
Usage
sim_fcs_results_irregular_6_8
Format
An object of class list
of length 3.
Results when people are observed at common 8 time points.
Description
Results when people are observed at common 8 time points.
Usage
sim_fcs_results_regular_8
Format
An object of class list
of length 3.
Truth of simulated data.
Description
Simulated data under the scenario where factors are correlated and have small variability (CS).
Usage
sim_fcs_truth
Format
An object of class list
of length 19.
Constructing subject-specific objects required for Gibbs sampler (for subjects with incomplete observations only).
Description
Constructing subject-specific objects required for Gibbs sampler (for subjects with incomplete observations only).
Usage
subject_specific_objects(k, q, a_full, a_avail, cor_all)
Arguments
k |
A numeric scalar. Number of latent factors. |
q |
A numeric scalar. Number of time points in the complete factor covariance matrix. |
a_full |
A q-dimensional numeric vector. Complete time sorted from early to late. |
a_avail |
A vector of time when gene expressions are available, sorted from early to late. |
cor_all |
A matrix of dimension (kq, kq). Correlation matrix of latent factor scores. |
Details
This function is used to extract subject-specific factor covariance matrix from the complete factor covariance matrix, through constructing subject-specific indicator matrix, which indicates time indexes when gene expression are available.
Value
Subject-specific objects needed for Gibbs sampler.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")
Generating a table listing all possible combinations of the binary variables for one gene.
Description
Generating a table listing all possible combinations of the binary variables for one gene.
Usage
table_generator(k)
Arguments
k |
A numeric scalar. Number of latent factors. |
Value
A table listing all possible combinations of the binary variables for one gene.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")