| Title: | Mixed, Low-Rank, and Sparse Multivariate Regression on High-Dimensional Data | 
| Version: | 0.1.0 | 
| Description: | Mixed, low-rank, and sparse multivariate regression ('mixedLSR') provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. 'mixedLSR' allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.1 | 
| Depends: | R (≥ 4.1.0) | 
| Imports: | grpreg, purrr, MASS, stats, ggplot2 | 
| Suggests: | knitr, rmarkdown, mclust | 
| VignetteBuilder: | knitr | 
| BugReports: | https://github.com/alexanderjwhite/mixedLSR | 
| URL: | https://alexanderjwhite.github.io/mixedLSR/ | 
| NeedsCompilation: | no | 
| Packaged: | 2022-11-04 10:33:31 UTC; whitealj | 
| Author: | Alexander White | 
| Maintainer: | Alexander White <whitealj@iu.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2022-11-04 20:00:02 UTC | 
Compute Bayesian information criterion for a mixedLSR model
Description
Compute Bayesian information criterion for a mixedLSR model
Usage
bic_lsr(a, n, llik)
Arguments
| a | A list of coefficient matrices. | 
| n | The sample size. | 
| llik | The log-likelihood of the model. | 
Value
The BIC.
Examples
n <- 50
simulate <- simulate_lsr(n)
model <- mixed_lsr(simulate$x, simulate$y, k = 2, init_lambda = c(1,1), alt_iter = 0)
bic_lsr(model$A, n = n, model$llik)
Internal Alternating Optimization Function
Description
Internal Alternating Optimization Function
Usage
fct_alt_optimize(
  x,
  y,
  k,
  clust_assign,
  lambda,
  alt_iter,
  anneal_iter,
  em_iter,
  temp,
  mu,
  eps,
  accept_prob,
  sim_N,
  verbose
)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| clust_assign | The current clustering assignment. | 
| lambda | A vector of penalization parameters. | 
| alt_iter | The maximum number of times to alternate between the classification expectation maximization algorithm and the simulated annealing algorithm. | 
| anneal_iter | The maximum number of simulated annealing iterations. | 
| em_iter | The maximum number of EM iterations. | 
| temp | The initial simulated annealing temperature, temp > 0. | 
| mu | The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1. | 
| eps | The final simulated annealing temperature, eps > 0. | 
| accept_prob | The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random. | 
| sim_N | The simulated annealing number of iterations for reaching equilibrium. | 
| verbose | A boolean indicating whether to print to screen. | 
Value
A final fit of mixedLSR
Internal Double Penalized Projection Function
Description
Internal Double Penalized Projection Function
Usage
fct_dpp(
  y,
  x,
  rank,
  lambda = NULL,
  alpha = 2 * sqrt(3),
  beta = 1,
  sigma,
  ptype = "grLasso",
  y_sparse = TRUE
)
Arguments
| y | A matrix of responses. | 
| x | A matrix of predictors. | 
| rank | The rank, if known. | 
| lambda | A vector of penalization parameters. | 
| alpha | A positive constant DPP parameter. | 
| beta | A positive constant DPP parameter. | 
| sigma | An estimated standard deviation | 
| ptype | A group penalized regression penalty type. See grpreg. | 
| y_sparse | Should Y coefficients be treated as sparse? | 
Value
A list containing estimated coefficients, covariance, and penalty parameters.
Internal EM Algorithm
Description
Internal EM Algorithm
Usage
fct_em(x, y, k, lambda, clust_assign, lik_track, em_iter, verbose)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| lambda | A vector of penalization parameters. | 
| clust_assign | The current clustering assignment. | 
| lik_track | A vector storing the log-likelihood by iteration. | 
| em_iter | The maximum number of EM iterations. | 
| verbose | A boolean indicating whether to print to screen. | 
Value
A mixedLSR model.
Internal Posterior Calculation
Description
Internal Posterior Calculation
Usage
fct_gamma(
  x,
  y,
  k,
  N,
  clust_assign,
  pi_vec,
  lambda,
  alpha,
  beta,
  y_sparse,
  rank,
  max_rank
)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| N | The sample size. | 
| clust_assign | The current clustering assignment. | 
| pi_vec | A vector of mixing probabilities for each cluster label. | 
| lambda | A vector of penalization parameters. | 
| alpha | A positive constant DPP parameter. | 
| beta | A positive constant DPP parameter. | 
| y_sparse | Should Y coefficients be treated as sparse? | 
| rank | The rank, if known. | 
| max_rank | The maximum allowed rank. | 
Value
A list with the posterior, coefficients, and estimated covariance.
Internal Partition Initialization Function
Description
Internal Partition Initialization Function
Usage
fct_initialize(k, N)
Arguments
| k | The number of groups. | 
| N | The sample size. | 
Value
A vector of assignments.
Internal Likelihood Function
Description
Internal Likelihood Function
Usage
fct_j_lik(
  x,
  y,
  k,
  clust_assign,
  lambda,
  alpha = 2 * sqrt(3),
  beta = 1,
  y_sparse = TRUE,
  max_rank = 3,
  rank = NULL
)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| clust_assign | A vector of cluster labels. | 
| lambda | A vector of penalization parameters. | 
| alpha | A positive constant DPP parameter. | 
| beta | A positive constant DPP parameter. | 
| y_sparse | Should Y coefficients be treated as sparse? | 
| max_rank | The maximum allowed rank. | 
| rank | The rank, if known. | 
Value
The weighted log-likelihood
Internal Log-Likelihood Function
Description
Internal Log-Likelihood Function
Usage
fct_log_lik(mu_mat, sig_vec, y, N, m)
Arguments
| mu_mat | The mean matrix. | 
| sig_vec | A vector of sigma. | 
| y | The output matrix. | 
| N | The sample size. | 
| m | The number of y features. | 
Value
A posterior matrix.
Internal Perturb Function
Description
Internal Perturb Function
Usage
fct_new_assign(assign, k, p)
Arguments
| assign | The current clustering assignments. | 
| k | The number of groups. | 
| p | The acceptance probability. | 
Value
A perturbed assignment.
Internal Pi Function
Description
Internal Pi Function
Usage
fct_pi_vec(clust_assign, k, N)
Arguments
| clust_assign | The current clustering assignment. | 
| k | The number of groups. | 
| N | The sample size. | 
Value
A mixing vector.
Internal Rank Estimation Function
Description
Internal Rank Estimation Function
Usage
fct_rank(x, y, sigma, eta)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| sigma | An estimated noise level. | 
| eta | A rank selection parameter. | 
Value
The estimated rank.
Internal Penalty Parameter Selection Function.
Description
Internal Penalty Parameter Selection Function.
Usage
fct_select_lambda(
  x,
  y,
  k,
  clust_assign = NULL,
  initial = FALSE,
  type = "all",
  verbose
)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| clust_assign | The current clustering assignment. | 
| initial | An initial penalty parameter. | 
| type | A type. | 
| verbose | A boolean indicating whether to print to screen. | 
Value
A selected penalty parameter.
Internal Sigma Estimation Function
Description
Internal Sigma Estimation Function
Usage
fct_sigma(y, N, m)
Arguments
| y | A matrix of responses. | 
| N | The sample size. | 
| m | The number of outcome variables. | 
Value
The estimated sigma.
Internal Simulated Annealing Function
Description
Internal Simulated Annealing Function
Usage
fct_sim_anneal(
  x,
  y,
  k,
  init_assign,
  lambda,
  temp,
  mu,
  eps,
  accept_prob,
  sim_N,
  track,
  anneal_iter = 1000,
  verbose
)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| init_assign | An initial clustering assignment. | 
| lambda | A vector of penalization parameters. | 
| temp | The initial simulated annealing temperature, temp > 0. | 
| mu | The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1. | 
| eps | The final simulated annealing temperature, eps > 0. | 
| accept_prob | The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random. | 
| sim_N | The simulated annealing number of iterations for reaching equilibrium. | 
| track | A likelihood tracking vector. | 
| anneal_iter | The maximum number of simulated annealing iterations. | 
| verbose | A boolean indicating whether to print to screen. | 
Value
An updated clustering vector.
Internal Weighted Log Likelihood Function
Description
Internal Weighted Log Likelihood Function
Usage
fct_weighted_ll(gamma)
Arguments
| gamma | A posterior matrix | 
Value
A weighted log likelihood vector
Mixed Low-Rank and Sparse Multivariate Regression for High-Dimensional Data
Description
Mixed Low-Rank and Sparse Multivariate Regression for High-Dimensional Data
Usage
mixed_lsr(
  x,
  y,
  k,
  nstart = 1,
  init_assign = NULL,
  init_lambda = NULL,
  alt_iter = 5,
  anneal_iter = 1000,
  em_iter = 1000,
  temp = 1000,
  mu = 0.95,
  eps = 1e-06,
  accept_prob = 0.95,
  sim_N = 200,
  verbose = TRUE
)
Arguments
| x | A matrix of predictors. | 
| y | A matrix of responses. | 
| k | The number of groups. | 
| nstart | The number of random initializations, the result with the maximum likelihood is returned. | 
| init_assign | A vector of initial assignments, NULL by default. | 
| init_lambda | A vector with the values to initialize the penalization parameter for each group, e.g., c(1,1,1). Set to NULL by default. | 
| alt_iter | The maximum number of times to alternate between the classification expectation maximization algorithm and the simulated annealing algorithm. | 
| anneal_iter | The maximum number of simulated annealing iterations. | 
| em_iter | The maximum number of EM iterations. | 
| temp | The initial simulated annealing temperature, temp > 0. | 
| mu | The simulated annealing decrease temperature fraction. Once the best configuration cannot be improved, reduce the temperature to (mu)T, 0 < mu < 1. | 
| eps | The final simulated annealing temperature, eps > 0. | 
| accept_prob | The simulated annealing probability of accepting a new assignment 0 < accept_prob < 1. When closer to 1, trial assignments will only be small perturbation of the current assignment. When closer to 0, trial assignments are closer to random. | 
| sim_N | The simulated annealing number of iterations for reaching equilibrium. | 
| verbose | A boolean indicating whether to print to screen. | 
Value
A list containing the likelihood, the partition, the coefficient matrices, and the BIC.
Examples
simulate <- simulate_lsr(50)
mixed_lsr(simulate$x, simulate$y, k = 2, init_lambda = c(1,1), alt_iter = 0)
Heatmap Plot of the mixedLSR Coefficient Matrices
Description
Heatmap Plot of the mixedLSR Coefficient Matrices
Usage
plot_lsr(a, abs = TRUE)
Arguments
| a | A coefficient matrix from mixed_lsr model. | 
| abs | A boolean for taking the absolute value of the coefficient matrix. | 
Value
A ggplot2 heatmap of the coefficient matrix, separated by subgroup.
Examples
simulate <- simulate_lsr()
plot_lsr(simulate$a)
Simulate Heterogeneous, Low-Rank, and Sparse Data
Description
Simulate Heterogeneous, Low-Rank, and Sparse Data
Usage
simulate_lsr(
  N = 100,
  k = 2,
  p = 30,
  m = 35,
  b = 1,
  d = 20,
  h = 0.2,
  case = "independent"
)
Arguments
| N | The sample size, default = 100. | 
| k | The number of groups, default = 2. | 
| p | The number of predictor features, default = 30. | 
| m | The number of response features, default = 35. | 
| b | The signal-to-noise ratio, default = 1. | 
| d | The singular value, default = 20. | 
| h | The lower bound for the singular matrix simulation, default = 0.2. | 
| case | The covariance case, "independent" or "dependent", default = "independent". | 
Value
A list of simulation values, including x matrix, y matrix, coefficients and true clustering assignments.
Examples
simulate_lsr()