Type: | Package |
Date: | 2023-12-12 |
Title: | Infinite Mixtures of Infinite Factor Analysers and Related Models |
Version: | 2.2.0 |
Description: | Provides flexible Bayesian estimation of Infinite Mixtures of Infinite Factor Analysers and related models, for nonparametrically clustering high-dimensional data, introduced by Murphy et al. (2020) <doi:10.1214/19-BA1179>. The IMIFA model conducts Bayesian nonparametric model-based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors, mostly via efficient Gibbs updates. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty. |
Depends: | R (≥ 4.0.0) |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
URL: | https://cran.r-project.org/package=IMIFA |
BugReports: | https://github.com/Keefe-Murphy/IMIFA |
LazyData: | true |
Imports: | matrixStats (≥ 1.0.0), mclust (≥ 5.4), mvnfast, Rfast (≥ 1.9.8), slam, viridisLite |
Suggests: | gmp (≥ 0.5-4), knitr, mcclust, rmarkdown, Rmpfr |
RoxygenNote: | 7.2.3 |
VignetteBuilder: | knitr |
Collate: | 'MainFunction.R' 'Diagnostics.R' 'FullConditionals.R' 'Gibbs_FA.R' 'Gibbs_IFA.R' 'Gibbs_IMFA.R' 'Gibbs_IMIFA.R' 'Gibbs_MFA.R' 'Gibbs_MIFA.R' 'Gibbs_OMFA.R' 'Gibbs_OMIFA.R' 'IMIFA.R' 'PlottingFunctions.R' 'SimulateData.R' 'data.R' |
NeedsCompilation: | no |
Packaged: | 2023-12-12 13:31:20 UTC; Keefe |
Author: | Keefe Murphy |
Maintainer: | Keefe Murphy <keefe.murphy@mu.ie> |
Repository: | CRAN |
Date/Publication: | 2023-12-12 13:50:02 UTC |
IMIFA: Infinite Mixtures of Infinite Factor Analysers and Related Models
Description
A package for Bayesian nonparametric clustering of high-dimensional data sets, providing functions for fitting, diagnostic tools and plotting for Infinite Mixtures of Infinite Factor Analysers and the full suite of related models introduced by Murphy et al. (2020) <doi:10.1214/19-BA1179>. Allows model based clustering with factor analytic covariance structures without recourse to model selection criteria to choose the number of clusters or cluster-specific latent factors. Model-specific diagnostic tools are also provided, as well as many options for plotting results, conducting posterior inference on parameters of interest, posterior predictive checking, and quantifying uncertainty.
Details
- Type:
Package
- Package:
IMIFA
- Version:
2.2.0
- Date:
2023-12-12 (this version), 2017-02-02 (original release)
- Licence:
GPL (>= 3)
Usage
The three most important functions in the IMIFA package are: mcmc_IMIFA
, for fitting the model, get_IMIFA_results
, for extracting results from objects of the "IMIFA"
class generated by mcmc_IMIFA
, and the dedicated plotting function plot.Results_IMIFA
, for plotting results pertaining to parameters of inferential interest from objects of class "Results_IMIFA"
generated by get_IMIFA_results
.
Other functions also exist, e.g. for simulating data from a multivariate mixture of factor analysers, many functions for soliciting good priors, and many functions related to plotting.
mcmc_IMIFA
:
This function estimates models in the IMIFA family under the Bayesian paradigm. Most importantly, one must specify the method
in the form of an acronym (e.g. "MIFA"
for Mixtures of Infinite Factor Analysers) and ranges of values for range.G, the number of clusters, and range.Q, the number(s) of (cluster-specific) latent factors as required by said method.
get_IMIFA_results
:
Raw simulation objects generated by mcmc_IMIFA() are passed to this function in order to extract results of interest and conduct further post-processing if necessary.
plot.Results_IMIFA
:
Results obtained from get_IMIFA_Results are passed to this function with the type of plot desired specified by plot.meth
(e.g. "trace"
) and the parameter of interest specified by param
(e.g. "loadings"
).
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
See Also
Further details and examples are given in the associated vignette document:
vignette("IMIFA", package = "IMIFA")
Author(s)
Keefe Murphy [aut, cre], Cinzia Viroli [ctb], Isobel Claire Gormley [ctb]
Maintainer: Keefe Murphy - <keefe.murphy@mu.ie>
See Also
Useful links:
1st & 2nd Moments of the Pitman-Yor / Dirichlet Processes
Description
Calculate the a priori expected number of clusters (G_expected
) or the variance of the number of clusters (G_variance
) under a PYP or DP prior for a sample of size N
at given values of the concentration parameter alpha
and optionally also the Pitman-Yor discount
parameter. Useful for soliciting sensible priors (or fixed values) for alpha
or discount
under the "IMFA"
and "IMIFA"
methods for mcmc_IMIFA
. Additionally, for a given sample size N
and given expected number of clusters EG
, G_calibrate
elicits a value for the concentration parameter alpha
or the discount
parameter.
Usage
G_expected(N,
alpha,
discount = 0,
MPFR = TRUE)
G_variance(N,
alpha,
discount = 0,
MPFR = TRUE)
G_calibrate(N,
EG,
alpha = NULL,
discount = 0,
MPFR = TRUE,
...)
Arguments
N |
The sample size. |
alpha |
The concentration parameter. Must be specified (though not for |
discount |
The discount parameter for the Pitman-Yor process. Must be less than 1, but typically lies in the interval [0, 1). Defaults to 0 (i.e. the Dirichlet process). When |
MPFR |
Logical indicating whether the high-precision libraries |
EG |
The prior expected number of clusters. Must exceed |
... |
Additional arguments passed to |
Details
All arguments are vectorised. Users can also consult G_priorDensity
in order to solicit sensible priors.
For G_calibrate
, only one of alpha
or discount
can be supplied, and the function elicits a value for the opposing parameter which achieves the desired expected number of clusters EG
for the given sample size N
. By default, a value for alpha
subject to discount=0
(i.e. the Dirichlet process) is elicited. Note that alpha
may not be a positive integer multiple of discount
as it should be if discount
is negative. See Examples below.
Value
The expected number of clusters under the specified prior conditions (G_expected
), or the variance of the number of clusters (G_variance
), or the concentration parameter alpha
or discount
parameter achieving a particular expected number of clusters (G_calibrate
).
Note
G_variance
requires use of the Rmpfr
and gmp
libraries for non-zero discount
values. G_expected
requires these libraries only for the alpha=0
case. These libraries are strongly recommended (but they are not required) for G_calbirate
when discount
is non-zero, but they are required when alpha=0
is supplied. Despite the high precision arithmetic used, the functions can still be unstable for large N
and/or extreme values of alpha
and/or discount
. See the argument MPFR
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prunster, I., and Ruggiero, M. (2015) Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212-229.
Yamato, H. and Shibuya, M. (2000) Moments of some statistics of Pitman sampling formula, Bulletin of Informatics and Cybernetics, 32(1): 1-10.
See Also
G_priorDensity
, Rmpfr
, uniroot
Examples
# Certain examples require the use of the Rmpfr library
suppressMessages(require("Rmpfr"))
G_expected(N=50, alpha=19.23356, MPFR=FALSE)
G_variance(N=50, alpha=19.23356, MPFR=FALSE)
G_expected(N=50, alpha=c(19.23356, 12.21619, 1),
discount=c(0, 0.25, 0.7300045), MPFR=FALSE)
G_variance(N=50, alpha=c(19.23356, 12.21619, 1),
discount=c(0, 0.25, 0.7300045), MPFR=c(FALSE, TRUE, TRUE))
# Examine the growth rate of the DP
DP <- sapply(c(1, 5, 10), function(i) G_expected(1:200, alpha=i, MPFR=FALSE))
matplot(DP, type="l", xlab="N", ylab="G")
# Examine the growth rate of the PYP
PY <- sapply(c(0.25, 0.5, 0.75), function(i) G_expected(1:200, alpha=1, discount=i))
matplot(PY, type="l", xlab="N", ylab="G")
# Other special cases of the PYP are also facilitated
G_expected(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))
G_variance(N=50, alpha=c(27.1401, 0), discount=c(-27.1401/100, 0.8054448))
# Elicit values for alpha under a DP prior
G_calibrate(N=50, EG=25)
# Elicit values for alpha under a PYP prior
# G_calibrate(N=50, EG=25, discount=c(-27.1401/100, 0.25, 0.7300045))
# Elicit values for discount under a PYP prior
# G_calibrate(N=50, EG=25, alpha=c(12.21619, 1, 0), maxiter=2000)
Plot Pitman-Yor / Dirichlet Process Priors
Description
Plots the prior distribution of the number of clusters under a Pitman-Yor / Dirichlet process prior, for a sample of size N
at given values of the concentration parameter alpha
and optionally also the discount
parameter. Useful for soliciting sensible priors (or fixed values) for alpha
or discount
under the "IMFA"
and "IMIFA"
methods for mcmc_IMIFA
.
Usage
G_priorDensity(N,
alpha,
discount = 0,
show.plot = TRUE,
type = "h")
Arguments
N |
The sample size. |
alpha |
The concentration parameter. Must be specified and must be strictly greater than |
discount |
The discount parameter for the Pitman-Yor process. Must be less than 1, but typically lies in the interval [0, 1). Defaults to 0 (i.e. the Dirichlet process). When |
show.plot |
Logical indicating whether the plot should be displayed (default = |
type |
The type of plot to be drawn, as per |
Details
All arguments are vectorised. Users can also consult G_expected
, G_variance
, and G_calibrate
in order to solicit sensible priors.
Value
A plot of the prior distribution if show.plot
is TRUE
. Density values are returned invisibly. Note that the density values may not strictly sum to one in certain cases, as values small enough to be represented as zero may well be returned.
Note
The actual density values are returned invisibly. Therefore, they can be visualised as desired by the user even if show.plot
is FALSE
.
Requires use of the Rmpfr
and gmp
libraries; may encounter difficulty and slowness for large N
, especially with non-zero discount
values. Despite the high precision arithmetic used, the functions can be unstable for small values of discount
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prunster, I., and Ruggiero, M. (2015) Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2): 212-229.
See Also
Examples
# Plot Dirichlet process priors for different values of alpha
(DP <- G_priorDensity(N=50, alpha=c(3, 10, 25)))
# Verify that these alpha/discount values produce Pitman-Yor process priors with the same mean
alpha <- c(19.23356, 6.47006, 1)
discount <- c(0, 0.47002, 0.7300045)
G_expected(N=50, alpha=alpha, discount=discount)
# Now plot them to examine tail behaviour as discount increases
# Non-zero discount requires loading the "Rmpfr" library
suppressMessages(require("Rmpfr"))
(PY <- G_priorDensity(N=50, alpha=alpha, discount=discount, type="l"))
# Other special cases of the PYP are also facilitated
G_priorDensity(N=50, alpha=c(alpha, 27.1401, 0),
discount=c(discount, -27.1401/100, 0.8054448), type="b")
Show the NEWS file
Description
Show the NEWS
file of the IMIFA
package.
Usage
IMIFA_news()
Value
The IMIFA
NEWS
file, provided the session is interactive.
Examples
IMIFA_news()
Ledermann Bound
Description
Returns the maximum number of latent factors in a factor analysis model for data of dimension P
which actually achieves dimension reduction in terms of the number of covariance parameters. This Ledermann bound is given by the largest integer smaller than or equal to the solution k
of (M - k)^2 \geq M + k
.
Usage
Ledermann(P,
isotropic = FALSE,
int = TRUE)
Arguments
P |
Integer number of variables in data set. This argument is vectorised. |
isotropic |
Logical indicating whether uniquenesses are constrained to be isotropic, in which case the bound is simply |
int |
Logical indicating if the result should be returned as an integer by applying the |
Details
The Ledermann bound when istropic
is FALSE
is given by P + 0.5\left(1 - \sqrt{8P + 1}\right)
.
Value
The Ledermann bound, a non-negative integer obtained using floor
, or a vector of length(P)
such bounds.
Note
It has also been argued that the number of factors should not exceed floor((P - 1)/2)
, which is a necessarily stricter condition.
References
Anderson, T. W. and Rubin, H. (1956) Statistical inference in factor analysis. In Neyman, J. (Ed.), Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 3.5: Contributions to Econometrics, Industrial Research, and Psychometry, University of California Press, Berkeley, CA, U.S.A., pp. 111-150.
Examples
Ledermann(c(25, 50, 100))
floor((c(25, 50, 100) - 1) / 2) # stricter bounds
data(olive)
P <- ncol(olive[,-(1:2)])
Ledermann(P)
Ledermann(P, int=FALSE)
floor((P - 1)/2) # stricter bound
Check the validity of Multiplicative Gamma Process (MGP) hyperparameters
Description
Checks the hyperparameters for the multiplicative gamma process (MGP) shrinkage prior in order to ensure that the property of cumulative shrinkage (in expectation) holds, i.e. checks whether growing mass is assigned to small neighbourhoods of zero as the column index increases.
Usage
MGP_check(ad1,
ad2,
Q = 3L,
phi.shape = NULL,
phi.rate = NULL,
sigma.shape = NULL,
sigma.rate = NULL,
bd1 = 1,
bd2 = 1,
truncated = FALSE,
inverse = TRUE)
Arguments
ad1 , ad2 |
Shape hyperparameters for |
Q |
Number of latent factors. Defaults to |
phi.shape , phi.rate |
The shape and rate hyperparameters for the gamma prior on the local shrinkage parameters. Not necessary for checking if the cumulative shrinkage property holds, but worth supplying both if the actual a priori expected shrinkage factors are of interest. The default value(s) depends on the value of |
sigma.shape , sigma.rate |
The shape and rate hyperparameters for the gamma prior on the cluster shrinkage parameters. Not necessary for checking if the cumulative shrinkage property holds, but worth supplying both if the actual a priori expected shrinkage factors are of interest. The default value(s) depends on the value of |
bd1 , bd2 |
Rate hyperparameters for |
truncated |
A logical value indicating whether the version of the MGP prior based on left-truncated gamma distributions is invoked (see |
inverse |
Logical indicator for whether the cumulative shrinkage property is assessed against the induced Inverse Gamma prior, the default, or in terms of the Gamma prior (which is incorrect). This is always |
Details
This is called inside mcmc_IMIFA
for the "IFA"
, "MIFA"
, "OMIFA"
and "IMIFA"
methods. This function is vectorised with respect to the arguments ad1
, ad2
, phi.shape
, phi.rate
, sigma.shape
, sigma.rate
, bd1
and bd2
.
Value
A list of length 2 containing the following objects:
expectation |
The vector (or list of vectors) of actual expected a priori shrinkage factors. |
valid |
A logical (or vector of logicals) indicating whether the cumulative shrinkage property holds (in expectation). |
Note
It is recommended that ad2
be moderately large relative to ad1
, even if valid
can sometimes be TRUE
when this is not the case (e.g. when truncated=TRUE
). Similarly, satisfying this condition is no guarantee that valid
will be TRUE
, unless truncated=TRUE
. Therefore, a warning is returned if ad1 <= ad2
, regardless of the value taken by valid
, when truncated=FALSE
(the default).
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Durante, D. (2017). A note on the multiplicative gamma process, Statistics & Probability Letters, 122: 198-204.
Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models, Biometrika, 98(2): 291-306.
Zhang, X., Dunson, D. B., and Carin, L. (2011) Tree-structured infinite sparse factor model. In Getoor, L. and Scheffer, T. (Eds.), Proceedings of the 28th International Conference on Machine Learning (ICML 2011), June 28-July 2, 2011, Bellevue, WA, USA, pp. 785-792. Madison, WI, USA: Omnipress.
See Also
Examples
# Check if expected shrinkage under the MGP increases with the column index (WRONG approach!).
MGP_check(ad1=1.5, ad2=1.8, Q=10, phi.shape=3, inverse=FALSE)$valid #TRUE
# Check if the induced IG prior on the MGP column shrinkage parameters
# is stochastically increasing, thereby inducing cumulative shrinkage (CORRECT approach!).
MGP_check(ad1=1.5, ad2=1.8, Q=10, phi.shape=3, inverse=TRUE)$valid #FALSE
# Check again with a parameterisation that IS valid and examine the expected shrinkage values
(shrink <- MGP_check(ad1=1.5, ad2=2.8, Q=10, phi.shape=2, phi.rate=0.5, inverse=TRUE))
# Check previously invalid parameterisation again using truncated version of the MGP prior
MGP_check(ad1=1.5, ad2=1.8, Q=10, phi.shape=3, truncated=TRUE)$valid #TRUE
Estimate the Number of Free Parameters in Finite Factor Analytic Mixture Models (PGMM)
Description
Estimates the dimension of the 'free' parameters in fully finite factor analytic mixture models, otherwise known as Parsimonious Gaussian Mixture Models (PGMM), typically necessary for the penalty term of various model selection criteria.
Usage
PGMM_dfree(Q,
P,
G = 1L,
method = c("UUU", "UUC", "UCU", "UCC", "CUU", "CUC",
"CCU", "CCC", "CCUU", "UCUU", "CUCU", "UUCU"),
equal.pro = FALSE)
Arguments
Q |
The number of latent factors (which can be 0, corresponding to a model with diagonal covariance). This argument is vectorised. |
P |
The number of variables. Must be a single strictly positive integer. |
G |
The number of clusters. This defaults to 1. Must be a single strictly positive integer. |
method |
By default, calculation assumes the |
equal.pro |
Logical variable indicating whether or not the mixing mixing proportions are equal across clusters in the model (default = |
Value
A vector of length length(Q)
giving the total number of parameters, including means and mixing proportions, and not only covariance parameters. Set equal.pro
to FALSE
and subtract G * P
from the result to determine the number of covariance parameters only.
Note
This function is used to calculate the penalty terms for the aic.mcmc
and bic.mcmc
model selection criteria implemented in get_IMIFA_results
for finite factor models (though mcmc_IMIFA
currently only implements the UUU
, UUC
, UCU
, and UCC
covariance structures). The function is vectorised with respect to the argument Q
.
Though the function is available for standalone use, note that no checks take place, in order to speed up repeated calls to the function inside mcmc_IMIFA
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
McNicholas, P. D. and Murphy, T. B. (2008) Parsimonious Gaussian mixture models, Statistics and Computing, 18(3): 285-296.
McNicholas, P. D. and Murphy, T. B. (2010) Model-Based clustering of microarray expression data via latent Gaussian mixture models, Bioinformatics, 26(21): 2705-2712.
See Also
Examples
(UUU <- PGMM_dfree(Q=0:5, P=50, G=3, method="UUU"))
(CCC <- PGMM_dfree(Q=0:5, P=50, G=3, method="CCC", equal.pro=TRUE))
Procrustes Transformation
Description
This function performs a Procrustes transformation on a matrix X
to minimize the squared distance between X
and another comparable matrix Xstar
.
Usage
Procrustes(X,
Xstar,
translate = FALSE,
dilate = FALSE,
sumsq = FALSE)
Arguments
X |
The matrix to be transformed. |
Xstar |
The target matrix. |
translate |
Logical value indicating whether |
dilate |
Logical value indicating whether |
sumsq |
Logical value indicating whether the sum of squared differences between |
Details
R
, tt
, and d
are chosen so that:
d \times \mathbf{X} \mathbf{R} + 1\hspace*{-3.5pt}1 \underline{t}^\top \approx X^\star
X.new
is given by:
X_{\textrm{new}} = d \times \mathbf{X} \mathbf{R} + 1\hspace*{-3.5pt}1 \underline{t}^\top
Value
A list containing:
X.new |
The matrix that is the Procrustes transformed version of |
R |
The rotation matrix. |
t |
The translation vector (if |
d |
The scaling factor (is |
ss |
The sum of squared differences (if |
Note
X
is padded out with columns containing 0
if it has fewer columns than Xstar
.
References
Borg, I. and Groenen, P. J. F. (1997) Modern Multidimensional Scaling: Theory and Applications. Springer Series in Statistics. New York, NY, USA: Springer-Verlag, pp. 340-342.
Examples
# Match two matrices, allowing translation and dilation
mat1 <- diag(rnorm(10))
mat2 <- 0.05 * matrix(rnorm(100), 10, 10) + mat1
proc <- Procrustes(X=mat1, Xstar=mat2, translate=TRUE, dilate=TRUE, sumsq=TRUE)
# Extract the transformed matrix, rotation matrix, translation vector and scaling factor
mat_new <- proc$X.new
mat_rot <- proc$R
mat_t <- proc$t
mat_d <- proc$d
# Compare the sum of squared differences to a Procrustean transformation with rotation only
mat_ss <- proc$ss
mat_ss2 <- Procrustes(X=mat1, Xstar=mat2, sumsq=TRUE)$ss
USPS handwritten digits
Description
Training and test sets for the United States Postal Service (USPS) handwritten digits data, with 8-bit 16x16 grayscale grid representations of image scans of the digits "0" through "9".
Usage
data(USPSdigits)
Format
A list of length 2 with the following elements, each one a data.frame
:
train
The training set of 7,291 digits.
test
The test set of 2,007 digits.
Each data.frame
contains the known digit labels in its first column.
The remaining 256 columns give the concatenation of the 16x16 grid.
Pixels are scaled such that [-1,1] corresponds to [white,black].
References
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning (2nd edition). Springer Series in Statistics. New York, NY, USA: Springer.
See Also
Examples
# Load the data and record the labels
data(USPSdigits, package="IMIFA")
ylab <- USPSdigits$train[,1]
train <- USPSdigits$train[,-1]
# Examine the effect of discarding peripheral pixels
SDs <- apply(train, 2, sd)
ind <- SDs > 0.7
dat <- train[,ind]
hist(SDs, breaks=200, xlim=c(0, 1))
rect(0.7, 0, 1, 12, col=2, density=25)
show_digit(ind) # retained pixels are shown in black
Summarise MCMC samples of clustering labels with a similarity matrix and find the 'average' clustering
Description
This function takes a Monte Carlo sample of cluster labels, computes an average similarity matrix and returns the clustering with minimum mean squared error to this average. The mcclust
package must be loaded.
Usage
Zsimilarity(zs)
Arguments
zs |
A matrix containing samples of clustering labels where the columns correspond to the number of observations (N) and the rows correspond to the number of iterations (M). |
Details
This function takes a Monte Carlo sample of cluster labels, converts them to adjacency matrices, and computes a similarity matrix as an average of the adjacency matrices. The dimension of the similarity matrix is invariant to label switching and the number of clusters in each sample, desirable features when summarising partitions of Bayesian nonparametric models such as IMIFA. As a summary of the posterior clustering, the clustering with minimum mean squared error to this 'average' clustering is reported.
A heatmap of z.sim
may provide a useful visualisation, if appropriately ordered. The user is also invited to perform hierarchical clustering using hclust
after first converting this similarity matrix to a distance matrix - "complete" linkage is recommended. Alternatively, hc
could be used.
Value
A list containing three elements:
z.avg |
The 'average' clustering, with minimum squared distance to |
z.sim |
The N x N similarity matrix, in a sparse format (see |
MSE.z |
A vector of length M recording the MSEs between each clustering and the 'average' clustering. |
Note
The mcclust
package must be loaded.
This is liable to take quite some time to run, especially if the number of observations &/or number of iterations is large. Depending on how distinct the clusters are, z.sim
may be stored better in a non-sparse format. This function can optionally be called inside get_IMIFA_results
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Carmona, C., Nieto-barajas, L. and Canale, A. (2018) Model based approach for household clustering with mixed scale variables. Advances in Data Analysis and Classification, 13(2): 559-583.
See Also
get_IMIFA_results
, simple_triplet_matrix
, hclust
, hc
, comp.psm
, cltoSim
Examples
# Run a IMIFA model and extract the sampled cluster labels
# data(olive)
# sim <- mcmc_IMIFA(olive, method="IMIFA", n.iters=5000)
# zs <- sim[[1]][[1]]$z.store
# Get the similarity matrix and visualise it
# zsimil <- Zsimilarity(zs)
# z.sim <- as.matrix(zsimil$z.sim)
# z.col <- mat2cols(z.sim, cols=heat.colors(30, rev=TRUE))
# z.col[z.sim == 0] <- NA
# plot_cols(z.col, na.col=par()$bg); box(lwd=2)
# Extract the clustering with minimum squared distance to this
# 'average' and evaluate its performance against the true labels
# table(zsimil$z.avg, olive$area)
# Perform hierarchical clustering on the distance matrix
# Hcl <- hclust(as.dist(1 - z.sim), method="complete")
# plot(Hcl)
# table(cutree(Hcl, k=3), olive$area)
Control settings for the Bayesian Nonparametric priors for infinite mixture models (or shrinkage priors for overfitted mixtures)
Description
Supplies a list of arguments for use in mcmc_IMIFA
pertaining to the use of the Bayesian Nonparametric Pitman-Yor / Dirichlet process priors with the infinite mixture models "IMFA"
and "IMIFA"
. Certain arguments related to the Dirichlet concentration parameter for the overfitted mixtures "OMFA"
and "OMIFA"
can be supplied in this manner also.
Usage
bnpControl(learn.alpha = TRUE,
alpha.hyper = c(2L, 4L),
discount = 0,
learn.d = TRUE,
d.hyper = c(1L, 1L),
ind.slice = TRUE,
rho = 0.75,
trunc.G = NULL,
kappa = 0.5,
IM.lab.sw = TRUE,
thresh = FALSE,
exchange = FALSE,
zeta = NULL,
tune.zeta = list(...),
...)
Arguments
learn.alpha |
|
alpha.hyper |
|
discount |
The discount parameter used when generalising the Dirichlet process to the Pitman-Yor process. Defaults to 0, but typically must lie in the interval [0, 1). If greater than zero, The special case of |
learn.d |
Logical indicating whether the |
d.hyper |
Hyperparameters for the Beta(a,b) prior on the |
ind.slice |
Logical indicating whether the independent slice-efficient sampler is to be employed (defaults, typically, to |
rho |
Parameter controlling the rate of geometric decay for the independent slice-efficient sampler, s.t. |
trunc.G |
The maximum number of allowable and storable clusters under the |
kappa |
The spike-and-slab prior distribution on the |
IM.lab.sw |
Logical indicating whether the two forced label switching moves are to be implemented (defaults to |
thresh |
Logical indicating whether the threshold of Fall and Barat (2014) should be incorporated into the slice sampler. See the reference for details. This is an experimental feature (defaults to |
exchange |
Logical indicating whether the exchangeable slice sampler of Fall and Barat (2014) should be used instead. See the reference for details. This argument can work with or without |
zeta |
|
tune.zeta |
A list with the following named arguments, used for tuning
At least one If diminishing adaptation is invoked, the posterior mean |
... |
Catches unused arguments. |
Details
The crucial concentration parameter alpha
is documented within the main mcmc_IMIFA
function, and is relevant to all of the "IMIFA"
, "IMFA"
, "OMIFA"
, and "OMFA"
methods.
All arguments here are relevant to the "IMFA"
and "IMIFA"
methods, but the following are also related to the "OMFA"
and "OMIFA"
methods, and may behave differently in those instances: learn.alpha
, alpha.hyper
, zeta
, and tune.zeta
.
Value
A named list in which the names are the names of the arguments related to the BNP prior(s) and the values are the values supplied to the arguments.
Note
Certain supplied arguments will be subject to further checks within mcmc_IMIFA
. G_priorDensity
and G_moments
can help with soliciting sensible DP/PYP priors.
Under the "IMFA"
and "IMIFA"
methods, a Pitman-Yor process prior is specified by default. A Dirichlet process prior can be easily invoked when the discount
is fixed at 0
and learn.d=FALSE
. The normalized stable process can also be specified as a prior distribution, as a special case of the Pitman-Yor process, when alpha
remains fixed at 0
and learn.alpha=FALSE
(provided the discount
is fixed at a strictly positive value or learn.d=TRUE
). The special case of the Pitman-Yor process with negative discount
is also allowed as an experimental feature for which caution is advised, though learn.d
and learn.alpha
are forced to FALSE
and TRUE
, respectively, in this instance.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Kalli, M., Griffin, J. E. and Walker, S. G. (2011) Slice sampling mixture models, Statistics and Computing, 21(1): 93-105.
Fall, M. D. and Barat, E. (2014) Gibbs sampling methods for Pitman-Yor mixture models, hal-00740770v2.
See Also
mcmc_IMIFA
, G_priorDensity
, G_moments
, mixfaControl
, mgpControl
, storeControl
Examples
bnpctrl <- bnpControl(learn.d=FALSE, ind.slice=FALSE, alpha.hyper=c(3, 3))
# data(olive)
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, BNP=bnpctrl)
# Alternatively specify these arguments directly
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, learn.d=FALSE,
# ind.slice=FALSE, alpha.hyper=c(3, 3))
Chemical composition of Arabica and Robusta coffee samples
Description
Data on the chemical composition of coffee samples collected from around the world, comprising 43 samples from 29 countries. Each sample is either of the Arabica or Robusta variety. Twelve of the thirteen chemical constituents reported in the study are given. The omitted variable is total chlorogenic acid; it is generally the sum of the chlorogenic, neochlorogenic and isochlorogenic acid values.
Usage
data(coffee)
Format
A data frame with 43 observations and 14 columns. The first two columns contain Variety (either Arabica or Robusta) and Country, respectively, while the remaining 12 columns contain the chemical properties.
References
Streuli, H. (1973). Der heutige Stand der Kaffee-Chemie, In Proceedings of the 6th International Colloquium on Coffee Chemistry, June 4-June 9, 1973, Bogata, Colombia, pp. 61-72. Paris, France: Association Scientifique International du Café (ASIC).
Examples
data(coffee, package="IMIFA")
pairs(coffee[,-(1:2)], col=coffee$Variety)
Extract results, conduct posterior inference and compute performance metrics for MCMC samples of models from the IMIFA family
Description
This function post-processes simulations generated by mcmc_IMIFA
for any of the IMIFA family of models. This includes accounting for label switching, and accounting for rotational invariance via Procrustean methods. It can be re-ran at little computational cost in order to extract different models explored by the sampler used for sims
, without having to re-run the model itself. New results objects using different numbers of clusters and different numbers of factors (if visited by the model in question), or using different model selection criteria (if necessary) can be generated with ease. Posterior predictive checking of the appropriateness of the fitted model is also facilitated.
Usage
get_IMIFA_results(sims = NULL,
burnin = 0L,
thinning = 1L,
G = NULL,
Q = NULL,
criterion = c("bicm", "aicm", "dic", "bic.mcmc", "aic.mcmc"),
adapt = FALSE,
G.meth = c("mode", "median"),
Q.meth = c("mode", "median"),
conf.level = 0.95,
error.metrics = TRUE,
vari.rot = FALSE,
z.avgsim = FALSE,
zlabels = NULL,
nonempty = TRUE,
...)
## S3 method for class 'Results_IMIFA'
print(x,
...)
## S3 method for class 'Results_IMIFA'
summary(object,
MAP = TRUE,
...)
Arguments
sims |
An object of class |
burnin |
Optional additional number of iterations to discard. Defaults to 0, corresponding to no additional burnin. See |
thinning |
Optional interval for extra thinning to be applied. Defaults to 1, corresponding to no additional thinning. See |
G |
If this argument is not specified, results will be returned with the optimal number of clusters. If different numbers of clusters were explored in Similarly, this allows retrieval of samples corresponding to a solution, if visited, with |
Q |
If this argument is not specified, results will be returned with the optimal number of factors. If different numbers of factors were explored in Similarly, this allows retrieval of samples corresponding to a solution, if visited, with If adaptation didn't take place during model-fitting, |
criterion |
The criterion to use for model selection, where model selection is only required if more than one model was run under the Note that the first three options here might exhibit bias in favour of zero-factor models for the finite factor |
adapt |
A logical indicating if adaptation should be applied to the stored loadings and scores matrices to truncate the cluster-specific number(s) of non-redundant factors. This argument is only relevant if |
G.meth |
If the object in |
Q.meth |
If the object in |
conf.level |
The confidence level to be used throughout for credible intervals for all parameters of inferential interest, and error metrics if |
error.metrics |
A logical activating or deactivating posterior predictive checking: i.e. controlling whether metrics quantifying a) the posterior predictive reconstruction error (PPRE) between bin counts of the data and bin counts of replicate draws from the posterior distribution & and b) the error between the empirical and estimated covariance matrices should be computed. These are computed for every valid retained iteration (see The Frobenius norm is used in the computation of the PPRE, by default, but the |
vari.rot |
Logical indicating whether the loadings matrix/matrices template(s) should be |
z.avgsim |
Logical (defaults to Note that the MAP clustering is computed conditional on the estimate of the number of clusters (whether that be the modal estimate or the estimate according to Please be warned that this feature requires loading the |
zlabels |
For any method that performs clustering, the true labels can be supplied if they are known in order to compute clustering performance metrics. This also has the effect of ordering the MAP labels (and thus the ordering of cluster-specific parameters) to most closely correspond to the true labels if supplied. |
nonempty |
For |
x , object , MAP , ... |
Arguments required for the Users can also pass the When Finally, the |
Details
The function also performs post-hoc corrections for label switching, as well as post-hoc Procrustes rotation of loadings matrices and scores, in order to ensure sensible posterior parameter estimates, computes error metrics, constructs credible intervals, and generally transforms the raw sims
object into an object of class "Results_IMIFA"
in order to prepare the results for plotting via plot.Results_IMIFA
.
For the infinite factor methods, iterations where the maximum number of factors was greater than or equal to the maximum of the estimated cluster-specific factors are retained for posterior summaries of the scores, in order to preserve the estimated dimension of the scores matrices. Similarly, these are also the valid iterations used for the computation of the averages and credible intervals for the error metrics. For the finite factor models, all retained iterations are used in both instances (i.e. both for the scores and the error metrics).
In all cases, only iterations with G
non-empty components are retained.
Value
An object of class "Results_IMIFA"
to be passed to plot.Results_IMIFA
for visualising results. Dedicated print
and summary
functions also exist for objects of this class. The object, say x
, is a list of lists, the most important components of which are:
Clust |
Everything pertaining to clustering performance can be found here for all but the More detail is given if known |
Error |
Everything pertaining the model fit assessment can be found here, incl. the distribution of the PPRE values and associated bin counts for the replicate draws, as well as average error metrics (e.g. MSE, RMSE), and credible intervals quantifying the associated uncertainty, between the empirical and estimated covariance matrix/matrices, both of which are also included. |
GQ.results |
Everything pertaining to model choice can be found here, incl. posterior summaries for the estimated number of clusters and estimated number of factors, if applicable to the method employed. Model selection criterion values are also accessible here. |
Means |
Posterior summaries for the means, after conditioning on |
Loadings |
Posterior summaries for the factor loadings matrix/matrices, after conditioning on The number of iterations retained for posterior summaries of the loadings may vary for different clusters for the infinite factor methods, corresponding to iterations where the cluster-specific number of factors was greater than or equal to the modal estimate of the cluster-specific number of factors. |
Scores |
Posterior summaries for the latent factor scores, after conditioning on the maximum of the estimated number of cluster-specific factors. Summaries are given for the single matrix of factor scores. See For the infinite factor methods, iterations where the maximum number of factors was greater than or equal to the maximum of the estimated cluster-specific factors are retained for posterior summaries of the scores, in order to preserve the estimated dimension of the scores matrices. |
Uniquenesses |
Posterior summaries for the uniquenesses, after conditioning on |
The objects Means
, Loadings
, Scores
and Uniquenesses
(if stored when calling mcmc_IMIFA
!) also contain, as well as the posterior summaries, the entire chain of valid samples of each, as well as, for convenience, the last valid samples of each (after conditioning on the modal G
and Q
values, and accounting for label switching, and rotational invariance via Procrustes rotation).
Note
For the "IMIFA"
, "IMFA"
, "OMIFA"
, and "OMFA"
methods, the retained mixing proportions are renormalised after conditioning on the modal G
. This is especially necessary for the computation of the error.metrics
, just note that the values on which posterior inference are conducted will ever so slightly differ from the actually sampled values.
Due to the way the offline label-switching correction is performed, different runs of this function may give very slightly different results in terms of the cluster labellings (and by extension the parameters, which are permuted in the same way), but only if the chain was run for an extremely small number of iterations, well below the number required for convergence, and samples of the cluster labels match poorly across iterations (particularly if the number of clusters suggested by those sampled labels is high).
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
See Also
plot.Results_IMIFA
, mcmc_IMIFA
, Zsimilarity
, scores_MAP
, sim_IMIFA_model
, Procrustes
, varimax
, norm
, mgpControl
, storeControl
Examples
# data(coffee)
# data(olive)
# Run a MFA model on the coffee data over a range of clusters and factors.
# simMFAcoffee <- mcmc_IMIFA(coffee, method="MFA", range.G=2:3, range.Q=0:3, n.iters=1000)
# Accept all defaults to extract the optimal model.
# resMFAcoffee <- get_IMIFA_results(simMFAcoffee)
# Instead let's get results for a 3-cluster model, allowing Q be chosen by aic.mcmc.
# resMFAcoffee2 <- get_IMIFA_results(simMFAcoffee, G=3, criterion="aic.mcmc")
# Run an IMIFA model on the olive data, accepting all defaults.
# simIMIFAolive <- mcmc_IMIFA(olive, method="IMIFA", n.iters=10000)
# Extract optimum results
# Estimate G & Q by the median of their posterior distributions
# Construct 90% credible intervals and try to return the similarity matrix.
# resIMIFAolive <- get_IMIFA_results(simIMIFAolive, G.meth="median", Q.meth="median",
# conf.level=0.9, z.avgsim=TRUE)
# summary(resIMIFAolive)
# Simulate new data from the above model
# newdata <- sim_IMIFA_model(resIMIFAolive)
# Fit an IFA model without adaptation and examine results with and without post-hoc adaptation
# Use the "SC" criterion for determining the number of active factors
# simIFAnoadapt <- mcmc_IMIFA(olive, method="IFA", n.iters=10000, adapt=FALSE)
# resIFAnoadapt <- get_IMIFA_results(simIFAnoadapt)
# resIFApostadapt <- get_IMIFA_results(simIFAnoadapt, adapt=TRUE, active.crit="SC")
# Compare to an IFA model with adaptive Gibbs sampling
# simIFAadapt <- mcmc_IMIFA(coffee, method="IFA", n.iters=10000, active.crit="BD")
# resIFAadapt <- get_IMIFA_results(simIFAadapt)
# plot(resIFAnoadapt, "GQ")
# plot(resIFApostadapt, "GQ")
# plot(resIFAadapt, "GQ")
Simulate Cluster Labels from Unnormalised Log-Probabilities using the Gumbel-Max Trick
Description
Samples cluster labels for N observations from G clusters efficiently using log-probabilities and the so-called Gumbel-Max trick, without requiring that the log-probabilities be normalised; thus redundant computation can be avoided.
Usage
gumbel_max(probs,
slice = FALSE)
Arguments
probs |
An N x G matrix of unnormalised probabilities on the log scale, where N is he number of observations that require labels to be sampled and G is the number of active clusters s.t. sampled labels can take values in |
slice |
A logical indicating whether or not the indicator correction for slice sampling has been applied to |
Details
Computation takes place on the log scale for stability/underflow reasons (to ensure negligible probabilities won't cause computational difficulties); in any case, many functions for calculating multivariate normal densities already output on the log scale.
Value
A vector of N sampled cluster labels, with the largest label no greater than G.
Note
Though the function is available for standalone use, note that no checks take place, in order to speed up repeated calls to the function inside mcmc_IMIFA
.
If the normalising constant is required for another reason, e.g. to compute the log-likelihood, it can be calculated by summing the output obtained by calling rowLogSumExps
on probs
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Yellott, J. I. Jr. (1977) The relationship between Luce's choice axiom, Thurstone's theory of comparative judgment, and the double exponential distribution, Journal of Mathematical Psychology, 15(2): 109-144.
See Also
Examples
# Create weights for 3 components
G <- 3
weights <- seq_len(G)
# Call gumbel_max() repeatedly to obtain samples of the labels, zs
iters <- 10000
zs <- replicate(iters, gumbel_max(probs=log(weights)))
# Compare answer to the normalised weights
tabulate(zs, nbins=G)/iters
(normalised <- as.numeric(weights/sum(weights)))
# Simulate a matrix of Dirichlet weights & the associated vector of N labels
N <- 400
G <- 8
sizes <- seq(from=85, to=15, by=-10)
weights <- matrix(rDirichlet(N * G, alpha=1, nn=sizes), byrow=TRUE, nrow=N, ncol=G)
(zs <- gumbel_max(probs=log(weights)))
Add a colour key legend to heatmap plots
Description
Using only base graphics, this function appends a colour key legend for heatmaps produced by, for instance, plot_cols
or image
.
Usage
heat_legend(data,
cols = NULL,
breaks = NULL,
cex.lab = 1,
...)
Arguments
data |
Either the data with which the heatmap was created or a vector containing its minimum and maximum values. Missing values are ignored. |
cols |
The colour palette used when the heatmap was created. By default, the same |
breaks |
Optional argument giving the break-points for the axis labels. |
cex.lab |
Magnification of axis annotation, indicating the amount by which plotting text and symbols should be scaled relative to the default of 1. |
... |
Catches unused arguments. |
Value
Modifies an existing plot by adding a colour key legend.
See Also
image
, plot_cols
, mat2cols
, is.cols
Examples
# Generate a matrix and plot it with a legend
data <- matrix(rnorm(50), nrow=10, ncol=5)
cols <- heat.colors(12)[12:1]
par(mar=c(5.1, 4.1, 4.1, 3.1))
plot_cols(mat2cols(data, col=cols))
heat_legend(data, cols); box(lwd=2)
Check for Valid Colours
Description
Checks if the supplied vector contains valid colours.
Usage
is.cols(cols)
Arguments
cols |
A vector of colours, usually as a character string. |
Value
A logical vector of length length(cols)
which is TRUE
for entries which are valid colours and FALSE
otherwise.
Examples
all(is.cols(1:5))
all(is.cols(heat.colors(30)))
any(!is.cols(c("red", "green", "aquamarine")))
Check Positive-(Semi)definiteness of a matrix
Description
Tests whether all eigenvalues of a symmetric matrix are positive (or strictly non-negative) to check for positive-definiteness and positive-semidefiniteness, respectively. If the supplied matrix doesn't satisfy the test, the nearest matrix which does can optionally be returned.
Usage
is.posi_def(x,
tol = NULL,
semi = FALSE,
make = FALSE)
Arguments
x |
A matrix, assumed to be real and symmetric. |
tol |
Tolerance for singular values and for absolute eigenvalues - only those with values larger than tol are considered non-zero. (default: tol = |
semi |
Logical switch to test for positive-semidefiniteness when |
make |
Logical switch to return the nearest matrix which satisfies the test - if the test has been passed, this is of course just |
Value
If isTRUE(make)
, a list with two components:
check |
A logical value indicating whether the matrix satisfies the test. |
X.new |
The nearest matrix which satisfies the test (which may just be the input matrix itself.) |
Otherwise, only the logical value indicating whether the matrix satisfies the test is returned.
Examples
x <- cov(matrix(rnorm(100), nrow=10, ncol=10))
is.posi_def(x) #FALSE
is.posi_def(x, semi=TRUE) #TRUE
Xnew <- is.posi_def(x, semi=FALSE, make=TRUE)$X.new
identical(x, Xnew) #FALSE
identical(x, is.posi_def(x, semi=TRUE, make=TRUE)$X.new) #TRUE
Left Truncated Gamma Distributions
Description
Functions to draw pseudo-random numbers from, or calculate the expectation of, left-truncated gamma distributions (see Details below).
Usage
rltrgamma(n,
shape,
rate = 1,
trunc = 1)
exp_ltrgamma(shape,
rate = 1,
trunc = 1,
inverse = FALSE)
Arguments
n |
Number of observations to generate. |
shape |
Shape parameter for the desired gamma distribution. Must be strictly positive |
rate |
Rate parameter for the desired gamma distribution. Must be strictly positive. |
trunc |
The point of left truncation (corresponding to |
inverse |
A logical indicating whether to calculate the expectation for a right-truncated inverse gamma distribution instead of a left-truncated gamma distribution. Defaults to |
Details
The left-truncated gamma distribution has PDF:
f(x|\alpha, \beta) = \frac{\beta^\alpha}{(\Gamma(\alpha)-\Gamma(\alpha, \tau\beta))}x^{\alpha-1}e^{-x\beta}
for 0\le\tau\le x
, and \min(\tau,\beta) > 0
, where \alpha
and \beta
are the shape
and rate
parameters, respectively, \tau
is the cutoff point at which trunc
ation occurs, and \Gamma(\alpha, \tau\beta)
is the upper incomplete gamma function.
Value
For rltrgamma
, a vector of length n
giving draws from the left-truncated gamma distribution with the specified shape
and rate
parameters, and truncation point trunc
.
For exp_ltrgamma
, the expected value of a left-truncated (inverse) gamma distribution.
Note
rltrgamma
is invoked internally for the "IFA"
, "MIFA"
, "OMIFA"
, and "IMIFA"
models to draw column shrinkage parameters for all but the first loadings column under the MGP prior when truncated=TRUE
(which is not the default) is supplied to mgpControl
, at the expense of slightly longer run times. exp_ltrgamma
is used within MGP_check
to check the validity of the MGP hyperparameters when truncated=TRUE
(which is again, not the default). Both functions always assume trunc=1
for these internal usages.
Note also that no arguments are recycled, i.e. all arguments must be of length 1
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Dagpunar, J. S. (1978) Sampling of variates from a truncated gamma distribution, Statistical Computation and Simulation, 8(1): 59-64.
See Also
Examples
# Generate left-truncated Ga(3.1, 2.1, 1) variates
rltrgamma(n=10, shape=3.1, rate=2.1)
# Calculate the expectation of a Ga(3.1, 2.1, 1) distribution
exp_ltrgamma(shape=3.1, rate=2.1)
# Calculate the expectation of an inverse gamma distribution right-truncated at 2
exp_ltrgamma(shape=3.1, rate=2.1, trunc=2, inverse=TRUE)
Convert a numeric matrix to colours
Description
Converts a matrix to a hex colour code representation for plotting using plot_cols
. Used internally by plot.Results_IMIFA
for plotting posterior mean loadings heatmaps.
Usage
mat2cols(mat,
cols = NULL,
compare = FALSE,
byrank = FALSE,
breaks = NULL,
na.col = "#808080FF",
transparency = 1,
...)
Arguments
mat |
Either a matrix or, when |
cols |
The colour palette to be used. The default palette uses |
compare |
Logical switch used when desiring comparable colour representations (usually for comparable heat maps) across multiple matrices. Ensures plots will be calibrated to a common colour scale so that, for instance, the colour on the heat map of an entry valued at 0.7 in Matrix A corresponds exactly to the colour of a similar value in Matrix B. When |
byrank |
Logical indicating whether to convert the matrix itself or the sample ranks of the values therein. Defaults to |
breaks |
Number of gradations in colour to use. Defaults to |
na.col |
Colour to be used to represent missing data. Will be checked for validity by |
transparency |
A factor in [0, 1] modifying the opacity for overplotted lines. Defaults to 1 (i.e. no transparency). Only relevant when |
... |
Catches unused arguments. |
Value
A matrix of hex colour code representations, or a list of such matrices when compare
is TRUE
.
See Also
plot_cols
, heat_legend
, is.cols
, cut
Examples
# Generate a colour matrix using mat2cols()
mat <- matrix(rnorm(100), nrow=10, ncol=10)
mat[2,3] <- NA
cols <- heat.colors(12)[12:1]
(matcol <- mat2cols(mat, cols=cols))
# Use plot_cols() to visualise the colours matrix
par(mar=c(5.1, 4.1, 4.1, 3.1))
plot_cols(matcol)
# Add a legend using heat_legend()
heat_legend(mat, cols=cols); box(lwd=2)
# Try comparing heat maps of multiple matrices
mat1 <- cbind(matrix(rnorm(100, sd=c(4,2)), nr=50, nc=2, byrow=TRUE), 0.1)
mat2 <- cbind(matrix(rnorm(150, sd=c(7,5,3)), nr=50, nc=3, byrow=TRUE), 0.1)
mat3 <- cbind(matrix(rnorm(50, sd=1), nr=50, nc=1, byrow=TRUE), 0.1)
mats <- list(mat1, mat2, mat3)
colmats <- mat2cols(mats, cols=cols, compare=TRUE)
par(mfrow=c(2, 3), mar=c(1, 2, 1, 2))
# Use common palettes (top row)
plot_cols(colmats[[1]]); heat_legend(range(mats), cols=cols); box(lwd=2)
plot_cols(colmats[[2]]); heat_legend(range(mats), cols=cols); box(lwd=2)
plot_cols(colmats[[3]]); heat_legend(range(mats), cols=cols); box(lwd=2)
# Use uncommon palettes (bottom row)
plot_cols(mat2cols(mat1, cols=cols)); heat_legend(range(mat1), cols=cols); box(lwd=2)
plot_cols(mat2cols(mat2, cols=cols)); heat_legend(range(mat2), cols=cols); box(lwd=2)
plot_cols(mat2cols(mat3, cols=cols)); heat_legend(range(mat3), cols=cols); box(lwd=2)
Adaptive Gibbs Sampler for Nonparametric Model-based Clustering using models from the IMIFA family
Description
Carries out Gibbs sampling for all models from the IMIFA family, facilitating model-based clustering with dimensionally reduced factor-analytic covariance structures, with automatic estimation of the number of clusters and cluster-specific factors as appropriate to the method employed. Factor analysis with one group (FA/IFA), finite mixtures (MFA/MIFA), overfitted mixtures (OMFA/OMIFA), infinite factor models which employ the multiplicative gamma process (MGP) shrinkage prior (IFA/MIFA/OMIFA/IMIFA), and infinite mixtures which employ Pitman-Yor / Dirichlet Process Mixture Models (IMFA/IMIFA) are all provided.
Usage
mcmc_IMIFA(dat,
method = c("IMIFA", "IMFA",
"OMIFA", "OMFA",
"MIFA", "MFA",
"IFA", "FA",
"classify"),
range.G = NULL,
range.Q = NULL,
MGP = mgpControl(...),
BNP = bnpControl(...),
mixFA = mixfaControl(...),
alpha = NULL,
storage = storeControl(...),
...)
## S3 method for class 'IMIFA'
print(x,
...)
## S3 method for class 'IMIFA'
summary(object,
...)
Arguments
dat |
A matrix or data frame such that rows correspond to observations ( |
method |
An acronym for the type of model to fit where:
In principle, of course, one could overfit the |
range.G |
Depending on the method employed, either the range of values for the number of clusters, or the conservatively high starting value for the number of clusters. Defaults to (and must be!) For the If |
range.Q |
Depending on the method employed, either the range of values for the number of latent factors or, for methods ending in IFA, the conservatively high starting value for the number of cluster-specific factors, in which case the default starting value is For methods ending in IFA, different clusters can be modelled using different numbers of latent factors (incl. zero); for methods not ending in IFA it is possible to fit zero-factor models, corresponding to simple diagonal covariance structures. For instance, fitting the If See |
MGP |
A list of arguments pertaining to the multiplicative gamma process (MGP) shrinkage prior and adaptive Gibbs sampler (AGS). For use with the infinite factor models |
BNP |
A list of arguments pertaining to the Bayesian Nonparametric Pitman-Yor / Dirichlet process priors, for use with the infinite mixture models |
mixFA |
A list of arguments pertaining to all other aspects of model fitting, e.g. MCMC settings, cluster initialisation, and hyperparameters common to every |
alpha |
Depending on the method employed, either the hyperparameter of the Dirichlet prior for the cluster mixing proportions, or the Pitman-Yor / Dirichlet process concentration parameter. Defaults to
See |
storage |
A vector of named logical indicators governing storage of parameters of interest for all models in the IMIFA family. Defaults are set by a call to |
... |
An alternative means of passing control parameters directly via the named arguments of |
x , object |
Object of class |
Details
Creates a raw object of class "IMIFA"
from which the optimal/modal model can be extracted by get_IMIFA_results
. Dedicated print
and summary
functions exist for objects of class "IMIFA"
.
Value
A list of lists of lists of class "IMIFA"
to be passed to get_IMIFA_results
. If the returned object is x
, candidate models are accessible via subsetting, where x
is of the following form:
x[[1:length(range.G)]][[1:length(range.Q)]]
.
However, these objects of class "IMIFA" should rarely if ever be manipulated by hand - use of the get_IMIFA_results
function is strongly advised.
Note
Further control over the specification of advanced function arguments can be obtained with recourse to the following functions:
mgpControl
Supply arguments (with defaults) pertaining to the multiplicative gamma process (MGP) shrinkage prior and adaptive Gibbs sampler (AGS). For use with the infinite factor models
"IFA"
,"MIFA"
,"OMIFA"
, and"IMIFA"
only.bnpControl
Supply arguments (with defaults) pertaining to the Bayesian Nonparametric Pitman-Yor / Dirichlet process priors, for use with the infinite mixture models
"IMFA"
and"IMIFA"
. Certain arguments related to the Dirichlet concentration parameter for the overfitted mixtures"OMFA"
and"OMIFA"
can be supplied in this manner also.mixfaControl
Supply arguments (with defaults) pertaining to all other aspects of model fitting (e.g. MCMC settings, cluster initialisation, and hyperparameters common to every
method
in theIMIFA
family.storeControl
Supply logical indicators governing storage of parameters of interest for all models in the IMIFA family. It may be useful not to store certain parameters if memory is an issue (e.g. for large data sets or for a large number of MCMC iterations after burnin and thinning).
Note however that the named arguments of these functions can also be supplied directly. Parameter starting values are obtained by simulation from the relevant prior distribution specified in these control functions, though initial means and mixing proportions are computed empirically.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Bhattacharya, A. and Dunson, D. B. (2011) Sparse Bayesian infinite factor models, Biometrika, 98(2): 291-306.
Kalli, M., Griffin, J. E. and Walker, S. G. (2011) Slice sampling mixture models, Statistics and Computing, 21(1): 93-105.
Rousseau, J. and Mengersen, K. (2011) Asymptotic Behaviour of the posterior distribution in overfitted mixture models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(5): 689-710.
McNicholas, P. D. and Murphy, T. B. (2008) Parsimonious Gaussian mixture models, Statistics and Computing, 18(3): 285-296.
See Also
get_IMIFA_results
, mixfaControl
, mgpControl
, bnpControl
, storeControl
, Ledermann
Examples
# data(olive)
# data(coffee)
# Fit an IMIFA model to the olive data. Accept all defaults.
# simIMIFA <- mcmc_IMIFA(olive, method="IMIFA")
# summary(simIMIFA)
# Fit an IMIFA model assuming a Pitman-Yor prior.
# Control the balance between the DP and PY priors using the kappa parameter.
# simPY <- mcmc_IMIFA(olive, method="IMIFA", kappa=0.75)
# summary(simPY)
# Fit a MFA model to the scaled olive data, with isotropic uniquenesses (i.e. MPPCA).
# Allow diagonal covariance as a special case where range.Q = 0.
# Don't store the scores. Accept all other defaults.
# simMFA <- mcmc_IMIFA(olive, method="MFA", n.iters=10000, range.G=3:6, range.Q=0:3,
# score.switch=FALSE, centering=FALSE, uni.type="isotropic")
# Fit a MIFA model to the centered & scaled coffee data, w/ cluster labels initialised by K-Means.
# Note that range.Q doesn't need to be specified. Allow IFA as a special case where range.G=1.
# simMIFA <- mcmc_IMIFA(coffee, method="MIFA", n.iters=10000, range.G=1:3, z.init="kmeans")
# Fit an IFA model to the centered and pareto scaled olive data.
# Note that range.G doesn't need to be specified. We can optionally supply a range.Q starting value.
# Enforce additional shrinkage using alpha.d1, alpha.d2, prop, and eps (via mgpControl()).
# simIFA <- mcmc_IMIFA(olive, method="IFA", n.iters=10000, range.Q=4, scaling="pareto",
# alpha.d1=2.5, alpha.d2=4, prop=0.6, eps=0.12)
# Fit an OMIFA model to the centered & scaled coffee data.
# Supply a sufficiently small alpha value. Try varying other hyperparameters.
# Accept the default value for the starting number of factors,
# but supply a value for the starting number of clusters.
# Try constraining uniquenesses to be common across both variables and clusters.
# simOMIFA <- mcmc_IMIFA(coffee, method="OMIFA", range.G=10, psi.alpha=3,
# phi.hyper=c(2, 1), alpha=0.8, uni.type="single")
Control settings for the MGP prior and AGS for infinite factor models
Description
Supplies a list of arguments for use in mcmc_IMIFA
pertaining to the use of the multiplicative gamma process (MGP) shrinkage prior and adaptive Gibbs sampler (AGS) for use with the infinite factor models "IFA"
, "MIFA"
, "OMIFA"
, and "IMIFA"
.
Usage
mgpControl(alpha.d1 = 2.1,
alpha.d2 = 3.1,
phi.hyper = c(3, 2),
sigma.hyper = c(3, 2),
active.crit = c("BD", "SC"),
prop = switch(active.crit, BD=0.7, SC=0.99),
eps = 0.1,
adapt = TRUE,
forceQg = FALSE,
cluster.shrink = TRUE,
truncated = FALSE,
b0 = 0.1,
b1 = 5e-05,
beta.d1 = 1,
beta.d2 = 1,
start.AGS = 2L,
stop.AGS = Inf,
delta0g = FALSE,
...)
Arguments
alpha.d1 |
Shape hyperparameter of the column shrinkage on the first column of the loadings according to the MGP shrinkage prior. Passed to |
alpha.d2 |
Shape hyperparameter of the column shrinkage on the subsequent columns of the loadings according to the MGP shrinkage prior. Passed to |
phi.hyper |
A vector of length 2 giving the shape and rate hyperparameters for the gamma prior on the local shrinkage parameters. Passed to |
sigma.hyper |
A vector of length 2 giving the shape and rate hyperparameters for the gamma prior on the cluster shrinkage parameters. Passed to |
active.crit |
A character string indicating which criterion to use to determine the number of active factors during adaptive Gibbs sampling (i.e. only relevant when |
prop |
Only relevant when
|
eps |
Only relevant when |
adapt |
A logical value indicating whether adaptation of the number of cluster-specific factors is to take place when the MGP prior is employed. Defaults to |
forceQg |
A logical indicating whether the upper limit on the number of cluster-specific factors |
cluster.shrink |
A logical value indicating whether to place the prior specified by |
truncated |
A logical value indicating whether the version of the MGP prior based on left-truncated gamma distributions is invoked (see Zhang et al. reference below and additional relevant documentation in |
b0 , b1 |
Intercept & slope parameters for the exponentially decaying adaptation probability:
Defaults to |
beta.d1 |
Rate hyperparameter of the column shrinkage on the first column of the loadings according to the MGP shrinkage prior. Passed to |
beta.d2 |
Rate hyperparameter of the column shrinkage on the subsequent columns of the loadings according to the MGP shrinkage prior. Passed to |
start.AGS |
The iteration at which adaptation under the AGS is to begin. Defaults to |
stop.AGS |
The iteration at which adaptation under the AGS is to stop completely. Defaults to |
delta0g |
Logical indicating whether the |
... |
Catches unused arguments. |
Value
A named list in which the names are the names of the arguments related to the MGP and AGS and the values are the values supplied to the arguments.
Note
Certain supplied arguments will be subject to further checks by MGP_check
to ensure the cumulative shrinkage property of the MGP prior holds according to the given parameterisation.
The adaptive Gibbs sampler (AGS) monitors the prop
of loadings elements within the neighbourhood eps
of 0 and discards columns or simulates new columns on this basis. However, if at any stage the number of group-specific latent factors reaches zero, the decision to add columns is instead based on a simple binary trial with probability 1-prop
, as there are no loadings entries to monitor.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Durante, D. (2017). A note on the multiplicative gamma process, Statistics & Probability Letters, 122: 198-204.
Bhattacharya, A. and Dunson, D. B. (2011) Sparse Bayesian infinite factor models, Biometrika, 98(2): 291-306.
Schiavon, L. and Canale, A. (2020) On the truncation criteria in infinite factor models, Stat, 9:e298.
Zhang, X., Dunson, D. B., and Carin, L. (2011) Tree-structured infinite sparse factor model. In Getoor, L. and Scheffer, T. (Eds.), Proceedings of the 28th International Conference on Machine Learning (ICML 2011), June 28-July 2, 2011, Bellevue, WA, USA, pp. 785-792. Madison, WI, USA: Omnipress.
See Also
mcmc_IMIFA
, Ledermann
, MGP_check
, ltrgamma
, mixfaControl
, bnpControl
, storeControl
, get_IMIFA_results
Examples
mgpctrl <- mgpControl(phi.hyper=c(2.5, 1), eps=1e-02, truncated=TRUE)
# data(olive)
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, MGP=mgpctrl)
# Alternatively specify these arguments directly
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000,
# phi.hyper=c(2.5, 1), eps=1e-02, truncated=TRUE)
# Use delta0g for a MIFA model with supplied cluster labels
# sim2 <- mcmc_IMIFA(olive, n.iters=5000, method="MIFA", range.G=3,
# z.list=olive$area, delta0g=TRUE, alpha.d1=4:2, alpha.d2=5:3
# sigma.hyper=matrix(c(4:6, rep(2, 3)), nrow=2, byrow=TRUE))
Control settings for the IMIFA family of factor analytic mixtures
Description
Supplies a list of arguments for use in mcmc_IMIFA
pertaining to ALL methods in the IMIFA
family: e.g. MCMC settings, cluster initialisation, generic hyperparameters for factor-analytic mixtures, etc.
Usage
mixfaControl(n.iters = 25000L,
burnin = n.iters/5L,
thinning = 2L,
centering = TRUE,
scaling = c("unit", "pareto", "none"),
uni.type = c("unconstrained", "isotropic",
"constrained", "single"),
psi.alpha = 2.5,
psi.beta = NULL,
mu.zero = NULL,
sigma.mu = 1L,
prec.mu = 0.01,
sigma.l = 1L,
z.init = c("hc", "kmeans", "list", "mclust", "priors"),
z.list = NULL,
equal.pro = FALSE,
uni.prior = c("unconstrained", "isotropic"),
mu0g = FALSE,
psi0g = FALSE,
drop0sd = TRUE,
verbose = interactive(),
...)
Arguments
n.iters |
The number of iterations to run the sampler for. Defaults to 25000. |
burnin |
The number of burn-in iterations for the sampler. Defaults to |
thinning |
The thinning interval used in the simulation. Defaults to 2. No thinning corresponds to 1. Note that chains can also be thinned later, using |
centering |
A logical value indicating whether mean centering should be applied to the data, defaulting to |
scaling |
The scaling to be applied - one of |
uni.type |
This argument specifies the type of constraint, if any, to be placed on the uniquenesses/idiosyncratic variances, i.e. whether a general diagonal matrix or isotropic diagonal matrix is to be assumed, and in turn whether these matrices are constrained to be equal across clusters. The default Constraints may be particularly useful when
The first letter U here corresponds to constraints on loadings (not yet implemented), the second letter corresponds to uniquenesses constrained/unconstrained across clusters, and the third letter corresponds to the isotropic constraint on the uniquenesses. Of course, only the third letter is of relevance for the single-cluster |
psi.alpha |
The shape of the inverse gamma prior on the uniquenesses. Defaults to 2.5. Must be greater than 1 if |
psi.beta |
The scale of the inverse gamma prior on the uniquenesses. Can be either a single scalar parameter, a vector of variable specific scales, or (if Note that optional arguments to |
mu.zero |
The mean of the prior distribution for the mean parameter. Either a scalar, a vector of appropriate dimension, or (if |
sigma.mu |
The covariance of the prior distribution for the cluster mean parameters. Always assumed to be a diagonal matrix, and set to the identity matrix by default. Can also be a scalar by which the identity is multiplied, a vector of appropriate dimension; if supplied as a matrix, only the diagonal elements will be extracted. Specifying |
prec.mu |
A scalar controlling the degree of flatness of the prior for the cluster means by scaling |
sigma.l |
A scalar controlling the diagonal covariance of the prior distribution for the loadings. Defaults to |
z.init |
The method used to initialise the cluster labels. Defaults to model-based agglomerative hierarchical clustering via In any case, unless |
z.list |
A user supplied list of cluster labels. Only relevant if |
equal.pro |
Logical variable indicating whether or not the mixing mixing proportions are to be equal across clusters in the model (default = |
uni.prior |
A switch indicating whether uniquenesses scale hyperparameters are to be |
mu0g |
Logical indicating whether the |
psi0g |
Logical indicating whether the |
drop0sd |
Logical indicating whether to drop variables with no standard deviation (defaults to |
verbose |
Logical indicating whether to print output (e.g. run times) and a progress bar to the screen while the sampler runs. By default is |
... |
Also catches unused arguments. A number of optional arguments can be also supplied here:
|
Value
A named list in which the names are the names of the arguments and the values are the values of the arguments.
Note
Users should be careful to note that data are mean-centered (centering=TRUE
) and unit-scaled (scaling="unit"
) by default when supplying other parameters among the list above, especially those related in any way to psi.hyper
, or to the other control functions mgpControl
and bnpControl
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
McNicholas, P. D. and Murphy, T. B. (2008) Parsimonious Gaussian mixture models, Statistics and Computing, 18(3): 285-296.
See Also
mcmc_IMIFA
, psi_hyper
, hc
, kmeans
, Mclust
, mgpControl
, bnpControl
, storeControl
Examples
mfctrl <- mixfaControl(n.iters=200, prec.mu=1E-03, sigma.mu=NULL,
beta0=1, uni.type="constrained")
# data(olive)
# sim <- mcmc_IMIFA(olive, "IMIFA", mixFA=mfctrl)
# Alternatively specify these arguments directly
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=200, prec.mu=1E-03,
# sigma.mu=NULL, beta0=1, uni.type="constrained")
# Use mu0g and psi0g for MIFA models with supplied cluster labels
# oliveScaled <- as.data.frame(scale(olive[,-(1:2)]))
# sim2 <- mcmc_IMIFA(olive, "MIFA", n.iters=200, range.G=c(3, 9),
# z.list=list(olive$area, olive$region), mu0g=TRUE, psi0g=TRUE,
# mu.zero=list(do.call(cbind, tapply(oliveScaled, olive$area, colMeans)),
# do.call(cbind, tapply(oliveScaled, olive$region, colMeans))))
Fatty acid composition of Italian olive oils
Description
Data on the percentage composition of eight fatty acids found by lipid fraction of 572 Italian olive oils. The data come from three areas; within each area there are a number of constituent regions, of which there are 9 in total.
Usage
data(olive)
Format
A data frame with 572 observations and 10 columns. The first columns gives the area (one of Southern Italy, Sardinia, and Northern Italy), the second gives the region, and the remaining 8 columns give the variables. Southern Italy comprises the North Apulia, Calabria, South Apulia, and Sicily regions, Sardinia is divided into Inland Sardinia and Coastal Sardinia and Northern Italy comprises the Umbria, East Liguria, and West Liguria regions.
References
Forina, M., Armanino, C., Lanteri, S. and Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition, In Martens, H. and H. Russrum Jr. (Eds.), Food Research and Data Analysis: Proceedings from the IUFoST Symposium, September 20-23, 1982, Oslo, Norway, pp. 189-214. London, UK: Applied Science Publishers.
Forina, M. and Tiscornia, E. (1982). Pattern recognition methods in the prediction of Italian olive oil origin by their fatty acid content, Annali di Chimica, 72: 143-155.
Examples
data(olive, package="IMIFA")
pairs(olive[,-(1:2)], col=olive$area)
region <- as.numeric(olive$region)
pairs(olive[,-(1:2)],
col=ifelse(region < 5, 1, ifelse(region < 7, 2, ifelse(region == 9, 4, 3))))
Pareto Scaling
Description
Pareto scaling of a numeric matrix, with or without centering. Observations are scaled by the square-root of their column-wise standard deviations.
Usage
pareto_scale(x,
centering = TRUE)
Arguments
x |
A numeric matrix-like object. |
centering |
A logical vector indicating whether centering is to be applied (default= |
Value
The Pareto scaled version of the matrix x
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
van den Berg, R. A., Hoefsloot, H. C. J, Westerhuis, J. A., Smilde, A. K., and van der Werf, M.J. (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics, 7(142).
Examples
dat <- pareto_scale(olive[,-(1:2)])
Plotting output and parameters of inferential interest for IMIFA and related models
Description
Plotting output and parameters of inferential interest for IMIFA and related models
Usage
## S3 method for class 'Results_IMIFA'
plot(x,
plot.meth = c("all", "correlation", "density", "errors", "GQ",
"means", "parallel.coords", "trace", "zlabels"),
param = c("means", "scores", "loadings", "uniquenesses",
"pis", "alpha", "discount"),
g = NULL,
mat = TRUE,
zlabels = NULL,
heat.map = TRUE,
show.last = FALSE,
palette = NULL,
ind = NULL,
fac = NULL,
by.fac = FALSE,
type = c("h", "n", "p", "l"),
intervals = TRUE,
common = TRUE,
partial = FALSE,
titles = TRUE,
transparency = 0.75,
...)
Arguments
x |
An object of class |
plot.meth |
The type of plot to be produced for the Special types of plots which don't require a
The argument |
param |
The parameter of interest for any of the following |
g |
Optional argument that allows specification of exactly which cluster the plot of interest is to be produced for. If not supplied, the user will be prompted to cycle through plots for all clusters. Also functions as an index for which plot to return when |
mat |
Logical indicating whether a |
zlabels |
The true labels can be supplied if they are known. If this is not supplied, the function uses the labels that were supplied, if any, to |
heat.map |
A logical which controls plotting posterior mean loadings or posterior mean scores as a heatmap, or else as something akin to |
show.last |
A logical indicator which defaults to |
palette |
An optional colour palette to be supplied if overwriting the default palette set inside the function by |
ind |
Either a single number indicating which variable to plot when |
fac |
Optional argument that provides an alternative way to specify |
by.fac |
Optionally allows (mat)plotting of scores and loadings by factor - i.e. observation(s) (scores) or variable(s) (loadings) for a given factor, respectively, controlled by |
type |
The manner in which the plot is to be drawn, as per the |
intervals |
Logical indicating whether credible intervals around the posterior mean(s) are to be plotted when |
common |
Logical indicating whether plots with Note that this affects the |
partial |
Logical indicating whether plots of type |
titles |
Logical indicating whether default plot titles are to be used ( |
transparency |
A factor in [0, 1] modifying the opacity for overplotted lines. Defaults to 0.75, unless semi-transparency is not supported. Only relevant when |
... |
Other arguments typically passed to |
Value
The desired plot with appropriate output and summary statistics printed to the console screen.
Note
Supplying the argument zlabels
does not have the same effect of reordering the sampled parameters as it does if supplied directly to get_IMIFA_results
.
When mat
is TRUE
and by.fac
is FALSE
(both defaults), the convention for dealing with overplotting for trace
and density
plots when param
is either scores
or loadings
is to plot the last factor first, such that the first factor appears 'on top'.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
See Also
mcmc_IMIFA
, get_IMIFA_results
, mat2cols
, plot_cols
Examples
# See the vignette associated with the package for more graphical examples:
# vignette("IMIFA", package = "IMIFA")
# data(olive)
# simIMIFA <- mcmc_IMIFA(olive, method="IMIFA")
# resIMIFA <- get_IMIFA_results(simIMIFA, z.avgsim=TRUE)
# Examine the posterior distribution(s) of the number(s) of clusters (G) &/or latent factors (Q)
# For the IM(I)FA and OM(I)FA methods, this also plots the trace of the active/non-empty clusters
# plot(resIMIFA, plot.meth="GQ")
# plot(resIMIFA, plot.meth="GQ", g=2)
# Plot clustering uncertainty (and, if available, the similarity matrix)
# plot(resIMIFA, plot.meth="zlabels", zlabels=olive$area)
# Visualise the posterior predictive reconstruction error
# plot(resIMIFA, plot.meth="errors", g=1)
# Compare histograms of the data vs. replicate draw from the posterior for the 1st variable
# plot(resIMIFA, plot.meth="errors", g=2, ind=1)
# Visualise empirical vs. estimated covariance error metrics
# plot(resIMIFA, plot.meth="errors", g=3)
# Look at the trace, density, posterior mean, and correlation of various parameters of interest
# plot(resIMIFA, plot.meth="all", param="means", g=1)
# plot(resIMIFA, plot.meth="all", param="means", g=1, ind=2)
# plot(resIMIFA, plot.meth="trace", param="scores")
# plot(resIMIFA, plot.meth="trace", param="scores", by.fac=TRUE)
# plot(resIMIFA, plot.meth="mean", param="loadings", g=1)
# plot(resIMIFA, plot.meth="mean", param="loadings", g=1, heat.map=FALSE)
# plot(resIMIFA, plot.meth="parallel.coords", param="uniquenesses")
# plot(resIMIFA, plot.meth="density", param="pis", intervals=FALSE, partial=TRUE)
# plot(resIMIFA, plot.meth="all", param="alpha")
# plot(resIMIFA, plot.meth="all", param="discount")
Plots a matrix of colours
Description
Plots a matrix of colours as a heat map type image or as points. Intended for joint use with mat2cols
.
Usage
plot_cols(cmat,
na.col = "#808080FF",
ptype = c("image", "points"),
border.col = "#808080FF",
dlabels = NULL,
rlabels = FALSE,
clabels = FALSE,
pch = 15,
cex = 3,
label.cex = 0.6,
...)
Arguments
cmat |
A matrix of valid colours, with missing values coded as |
na.col |
Colour used for missing |
ptype |
Switch controlling output as either a heat map |
border.col |
Colour of border drawn around the plot. |
dlabels , rlabels , clabels |
Vector of labels for the diagonals, rows, and columns, respectively. |
pch |
Point type used when |
cex |
Point cex used when |
label.cex |
Govens cex parameter used for labels. |
... |
Further graphical parameters. |
Value
Either an "image"
or "points"
type plot of the supplied colours.
See Also
mat2cols
, image
, heat_legend
, is.cols
Examples
# Generate a colour matrix using mat2cols()
mat <- matrix(rnorm(100), nrow=10, ncol=10)
mat[2,3] <- NA
cols <- heat.colors(12)[12:1]
(matcol <- mat2cols(mat, cols=cols))
# Use plot_cols() to visualise the colours matrix
par(mar=c(5.1, 4.1, 4.1, 3.1))
plot_cols(matcol)
# Add a legend using heat_legend()
heat_legend(mat, cols=cols); box(lwd=2)
# Replace colour of exact zero entries:
# Often important to call mat2cols() first (to include 0 in the cuts),
# then replace relevant entries with NA for plot_cols(), i.e.
mat[2,3] <- 0
matcol2 <- mat2cols(mat, cols=cols)
plot_cols(replace(matcol2, mat == 0, NA), na.col="blue")
heat_legend(mat, cols=cols); box(lwd=2)
Posterior Confusion Matrix
Description
For a (N * G
) matrix of posterior cluster membership probabilities, this function creates a (G * G
) posterior confusion matrix, whose hk-th entry gives the average probability that observations with maximum posterior allocation h will be assigned to cluster k.
Usage
post_conf_mat(z, scale = TRUE)
Arguments
z |
A ( Otherwise, a list of such matrices can be supplied, where each matrix in the list has the same dimensions. |
scale |
A logical indicator whether the PCM should be rescaled by its row sums. When |
Value
A (G * G
) posterior confusion matrix, whose hk-th entry gives the average probability that observations with maximum posterior allocation h will be assigned to cluster k. When scale=TRUE
, the benchmark matrix for comparison is the identity matrix of order G
, corresponding to a situation with no uncertainty in the clustering.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Ranciati, S., Vinciotti, V. and Wit, E., (2017) Identifying overlapping terrorist cells from the Noordin Top actor-event network, Annals of Applied Statistics, 14(3): 1516-1534.
See Also
get_IMIFA_results
Examples
# data(olive)
# sim <- mcmc_IMIFA(olive, n.iters=1000)
# res <- get_IMIFA_results(sim)
# (PCM <- post_conf_mat(res$Clust$post.prob))
# par(mar=c(5.1, 4.1, 4.1, 3.1))
# PCM <- replace(PCM, PCM == 0, NA)
# plot_cols(mat2cols(PCM, col=heat.colors(30, rev=TRUE), na.col=par()$bg)); box(lwd=2)
# heat_legend(PCM, cols=heat.colors(30, rev=TRUE))
# par(mar=c(5.1, 4.1, 4.1, 2.1))
Find sensible inverse gamma hyperparameters for variance/uniqueness parameters
Description
Takes an inverse-Gamma shape hyperparameter, and an inverse covariance matrix (or estimate thereof), and finds data-driven scale hyperparameters in such a way that Heywood problems are avoided for factor analysis or probabilistic principal components analysis (and mixtures thereof).
Usage
psi_hyper(shape,
dat,
type = c("unconstrained", "isotropic"),
beta0 = 3,
...)
Arguments
shape |
A positive shape hyperparameter. |
dat |
The data matrix for which the inverse covariance matrix is to be estimated. If data are to be centered &/or scaled within |
type |
A switch indicating whether a single scale ( |
beta0 |
See Note below. Must be a strictly positive numeric scalar. Defaults to |
... |
Catches unused arguments. Advanced users can also supply the sample covariance matrix of |
Details
Constraining uniquenesses to be isotropic provides the link between factor analysis and the probabilistic PCA model. When used in conjunction with mcmc_IMIFA
with "isotropic" or "single" uniquenesses, type
must be isotropic
, but for "unconstrained" or "constrained" uniquenesses, it's possible to specify either a single scale (type="isotropic"
) or variable-specific scales (type="unconstrained"
).
Used internally by mcmc_IMIFA
when its argument psi_beta
is not supplied.
Value
Either a single scale hyperparameter or ncol(dat)
variable-specific scale hyperparameters.
Note
When N > P
, where N
is the number of observations and P
is the number of variables, the inverse of the sample covariance matrix is used by default.
When N <= P
, the inverse either does not exist or the estimate thereof is highly unstable. Thus, an estimate of the form \left(\beta_0 + \frac{N}{2}\right)\left(\beta_0\mathcal{I}_p + 0.5\sum_{i=1}^N x_i x_i^\top\right)^{-1}
is used instead.
For unstandardised data, the estimate is instead constructed using a standardised version of the data, and the resulting inverse correlation matrix estimate is scaled appropriately by the diagonal entries of the sample covariance matrix of the original data.
This estimate can also be used in N > P
cases by explicitly supplying beta0
. It will also be used if inverting the sample covariance matrix fails in N > P
cases.
The optional argument beta0
can be supplied to mcmc_IMIFA
via the control function mixfaControl
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
Fruwirth-Schnatter, S. and Lopes, H. F. (2010). Parsimonious Bayesian factor analysis when the number of factors is unknown, Technical Report. The University of Chicago Booth School of Business.
Fruwirth-Schnatter, S. and Lopes, H. F. (2018). Sparse Bayesian factor analysis when the number of factors is unknown, to appear. <arXiv:1804.04231>.
Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3): 611-622.
See Also
Examples
data(olive)
olive2 <- olive[,-(1:2)]
shape <- 2.5
(scale1 <- psi_hyper(shape=shape, dat=olive2))
# Try again with scaled data
olive_S <- scale(olive2, center=TRUE, scale=TRUE)
# Use the inverse of the sample covariance matrix
(scale2 <- psi_hyper(shape=shape, dat=olive_S))
# Use the estimated inverse covariance matrix
(scale3 <- psi_hyper(shape=shape, dat=olive_S, beta0=3))
# In the normalised example, the mean uniquenesses (given by scale/(shape - 1)),
# can be interpreted as the prior proportion of the variance that is idiosyncratic
(prop1 <- scale1/(shape - 1))
(prop2 <- scale2/(shape - 1))
Simulate Mixing Proportions from a Dirichlet Distribution
Description
Generates samples from the Dirichlet distribution with parameter alpha
efficiently by simulating Gamma(alpha
, 1) random variables and normalising them.
Usage
rDirichlet(G,
alpha,
nn = 0L)
Arguments
G |
The number of clusters for which weights need to be sampled. |
alpha |
The Dirichlet hyperparameter, either of length 1 or |
nn |
A vector giving the number of observations in each of G clusters so that Dirichlet posteriors rather than priors can be sampled from. This defaults to 0, i.e. simulation from the prior. Must be non-negative. Be warned that this will be recycled if necessary. |
Value
A Dirichlet vector of G
weights which sum to 1.
Note
Though the function is available for standalone use, note that few checks take place, in order to speed up repeated calls to the function inside mcmc_IMIFA
. In particular, alpha
and nn
may be invisibly recycled.
While small values of alpha
have the effect of increasingly concentrating the mass onto fewer components, note that this function may return NaN
for excessively small values of alpha
, when nn=0
; see the details of rgamma
for small shape
values.
References
Devroye, L. (1986) Non-Uniform Random Variate Generation, New York, NY, USA: Springer-Verlag, p. 594.
Examples
(prior <- rDirichlet(G=5, alpha=1))
(posterior <- rDirichlet(G=5, alpha=1, nn=c(20, 41, 32, 8, 12)))
(asymmetric <- rDirichlet(G=5, alpha=c(3,4,5,1,2), nn=c(20, 41, 32, 8, 12)))
Decompose factor scores by cluster
Description
Takes posterior summaries of the overall factor scores matrix and returns lists of sub-matrices corresponding to the G
-cluster MAP partition.
Usage
scores_MAP(res,
dropQ = FALSE)
Arguments
res |
An object of class |
dropQ |
A logical indicating whether columns of the factor scores matrix should be dropped such that the number of columns in each sub-matrix corresponds to the cluster-specific number of factors (if the number of factors is indeed cluster-specific). When Note that this argument is irrelevant (i.e. always |
Details
Under the models in the IMIFA family, there exists only one factor scores matrix. For the finite factor methods, this has dimensions N * Q
.
For the infinite factor methods ("IFA"
, "MIFA"
, "OMIFA"
, and "IMIFA"
), the factor scores matrix has dimensions N * Qmax
, where Qmax
is the largest of the cluster-specific numbers of latent factors q_1,\ldots,q_g
. Entries of this matrix thus may have been padded out with zero entries, as appropriate, prior to the Procrustes rotation-based correction applied within get_IMIFA_results
(thus now these entries will be near-zero).
In partitioning rows of the factor scores matrix into the same clusters the corresponding observations themselves belong to according to the MAP clustering, the number of columns may vary according to the cluster-specific numbers of latent factors (depending on the value of dropQ
and the method employed).
Value
For models which achieve clustering, a list of lists (say x
) decomposing the posterior mean scores (x$post.eta
), the associated variance estimates (x$var.eta
) and credible intervals (x$ci.eta
), and the last valid sample of the scores (x$last.eta
) into lists of length G
, corresponding to the MAP clustering, with varying or common numbers of cluster-specific factors (depending on the value of dropQ
and the method employed).
For models with only one component, or the "FA"
and "IFA"
methods, scores cannot be decomposed, and posterior summaries of the scores will be returned unchanged.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
Examples
data(coffee)
sim <- mcmc_IMIFA(coffee, n.iters=1000)
res <- get_IMIFA_results(sim)
# Examine the single posterior mean scores matrix
res$Scores$post.eta
# Decompose into G matrices, common numbers of columns
eta <- scores_MAP(res)
eta$post.eta
# Allow the number of columns be cluster-specific
scores_MAP(res, dropQ=TRUE)$post.eta
Moment Matching Parameters of Shifted Gamma Distributions
Description
This function takes shape and rate parameters of a Gamma distribution and modifies them to achieve the same expected value and variance when the left extent of the support of the distribution is shifted up or down.
Usage
shift_GA(shape,
rate,
shift = 0,
param = c("rate", "scale"))
Arguments
shape , rate |
Shape and rate parameters a and b, respectively, of a Gamma(a, b) distribution. Both must be strictly positive. |
shift |
Modifier, such that the Gamma distribution has support on ( |
param |
Switch controlling whether the supplied |
Value
A named vector of length 2, containing the modified shape and rate parameters, respectively.
Note
This function is invoked within mcmc_IMIFA
when discount
is fixed at a non-zero value and learn.alpha=TRUE
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
Examples
# Shift a Ga(shape=4, rate=2) distribution to the left by 1;
# achieving the same expected value of 2 and variance of 1.
shift_GA(4, 2, -1)
Plot the posterior mean image
Description
Plots the posterior mean of a given cluster from an "IMIFA"
-related model fit to a digit data set in the form of a square grayscale grid.
Usage
show_IMIFA_digit(res,
G = 1,
what = c("mean", "last"),
dat = NULL,
ind = NULL,
...)
Arguments
res |
An object of class |
G |
The index of the cluster for which the posterior mean digit is to be represented. |
what |
A switch controlling whether the |
dat |
The full grayscale grid data set (prior to centering and scaling). Necessary when |
ind |
The index of columns of |
... |
Additional arguments to be passed, via |
Details
This function is a wrapper to show_digit
which supplies the posterior mean digit of a given cluster from a "IMIFA"
model.
Value
The desired image representation of the posterior mean digit (or the last valid sample) from the desired cluster.
Note
Note that both centering and scaling of the original data prior to modelling is accounted for in reconstructing the means, but dat
, if necessary, must be the raw data prior to pre-processing.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
USPSdigits
, show_digit
, get_IMIFA_results
, mcmc_IMIFA
, mat2cols
, plot_cols
Examples
# Load the USPS data and discard peripheral digits
data(USPSdigits)
ylab <- USPSdigits$train[,1]
train <- USPSdigits$train[,-1]
ind <- apply(train, 2, sd) > 0.7
dat <- train[,ind]
# Fit an IMIFA model (warning: quite slow!)
# sim <- mcmc_IMIFA(dat, n.iters=1000, prec.mu=1e-03, z.init="kmeans",
# centering=FALSE, scaling="none")
# res <- get_IMIFA_results(sim, zlabels=ylab)
# Examine the posterior mean image of the first two clusters
# show_IMIFA_digit(res, dat=train, ind=ind)
# show_IMIFA_digit(res, dat=train, ind=ind, G=2)
Show image of grayscale grid
Description
Plots an image of a grayscale grid representation of a digit.
Usage
show_digit(dat,
col = NULL,
...)
Arguments
dat |
A |
col |
The colour scale to be used. Defaults to |
... |
Additional arguments to be passed to |
Value
The desired image representation of the digit.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
USPSdigits
, show_IMIFA_digit
, mat2cols
, plot_cols
Examples
data(USPSdigits)
# Plot the first digit
show_digit(USPSdigits$train[1,-1])
# Visualise the overall mean
show_digit(colMeans(USPSdigits$train[,-1]))
Simulate Data from a Mixture of Factor Analysers Structure
Description
Functions to simulate data of any size and dimension from a (infinite) mixture of (infinite) factor analysers parameterisation or fitted object.
Usage
sim_IMIFA_data(N = 300L,
G = 3L,
P = 50L,
Q = rep(floor(log(P)), G),
pis = rep(1/G, G),
mu = NULL,
psi = NULL,
loadings = NULL,
scores = NULL,
nn = NULL,
loc.diff = 2,
non.zero = P,
forceQg = TRUE,
method = c("conditional", "marginal"))
sim_IMIFA_model(res,
method = c("conditional", "marginal"))
Arguments
N , G , P |
Desired overall number of observations, number of clusters, and number of variables in the simulated data set. All must be a single integer. |
Q |
Desired number of cluster-specific latent factors in the simulated data set. Can be specified either as a single integer if all clusters are to have the same number of factors, or a vector of length |
pis |
Mixing proportions of the clusters in the data set if |
mu |
True values of the mean parameters, either as a single value, a vector of length |
psi |
True values of uniqueness parameters, either as a single value, a vector of length |
loadings |
True values of the loadings matrix/matrices. Must be supplied in the form of a list of numeric matrices when |
scores |
True values of the latent factor scores, as a |
nn |
An alternative way to specify the size of each cluster, by giving the exact number of observations in each cluster explicitly. Must sum to |
loc.diff |
A parameter to control the closeness of the clusters in terms of the difference in their location vectors. Only relevant if More specifically,
|
non.zero |
Controls the number of non-zero entries in each loadings column (per cluster) only when Must be given as a list of length |
forceQg |
A logical indicating whether the upper limit on the number of cluster-specific factors |
method |
A switch indicating whether the mixture to be simulated from is the conditional distribution of the data given the latent variables (default), or simply the marginal distribution of the data. |
res |
An object of class |
Details
sim_IMIFA_model
is a simple wrapper to sim_IMIFA_data
which uses the estimated parameters of a fitted IMIFA related model, as generated by get_IMIFA_results
. The necessary parameters must have been originally stored via storeControl
in the creation of res
.
Value
Invisibly returns a data.frame
with N
observations (rows) of P
variables (columns). The true values of the parameters which generated these data are also stored as attributes.
Note
N
, G
, P
& Q
will NOT be inferred from the supplied parameters pis
, mu
, psi
, loadings
, scores
& nn
- rather, the parameters' length/dimensions must adhere to the supplied values of N
, G
, P
& Q
.
Missing values are not allowed in any of pis
, mu
, psi
, loadings
, scores
& nn
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
References
Murphy, K., Viroli, C., and Gormley, I. C. (2020) Infinite mixtures of infinite factor analysers, Bayesian Analysis, 15(3): 937-963. <doi:10.1214/19-BA1179>.
See Also
mcmc_IMIFA
for fitting an IMIFA related model to the simulated data set.
get_IMIFA_results
for generating input for sim_IMIFA_model
.
Ledermann
for details on the upper-bound for Q
. Note that this function accounts for isotropic uniquenesses, if psi
is supplied in that manner, in computing this bound.
Examples
# Simulate 100 observations from 3 balanced clusters with cluster-specific numbers of latent factors
# Specify isotropic uniquenesses within each cluster
# Supply cluster means directly
sim_data <- sim_IMIFA_data(N=100, G=3, P=20, Q=c(2, 2, 5), psi=1:3,
mu=matrix(rnorm(60, -2 + 1:3, 1), nrow=20, ncol=3, byrow=TRUE))
names(attributes(sim_data))
labels <- attr(sim_data, "Labels")
# Visualise the data in two-dimensions
plot(cmdscale(dist(sim_data), k=2), col=labels)
# Examine the overlap with a pairs plot of 5 randomly chosen variables
pairs(sim_data[,sample(1:20, 5)], col=labels)
# Fit a MIFA model to this data
# tmp <- mcmc_IMIFA(sim_data, method="MIFA", range.G=3, n.iters=5000)
# Simulate from this model
# res <- get_IMIFA_results(tmp, zlabels=labels)
# sim_mod <- sim_IMIFA_model(res)
Set storage values for use with the IMIFA family of models
Description
Supplies a list of values for logical switches indicating whether parameters of interest (means, scores, loadings, uniquenesses, and mixing proportions) should be stored when running models from the IMIFA family via mcmc_IMIFA
. It may be useful not to store certain parameters if memory is an issue.
Usage
storeControl(mu.switch = TRUE,
score.switch = TRUE,
load.switch = TRUE,
psi.switch = TRUE,
pi.switch = TRUE,
update.mu = mu.switch,
...)
Arguments
mu.switch |
Logical indicating whether the means are to be stored (defaults to |
score.switch |
Logical indicating whether the factor scores are to be stored. As the array containing each sampled scores matrix tends to be amongst the largest objects to be stored, this defaults to Unlike other parameters, the scores need not be stored for posterior predictive checking (see Note below). |
load.switch |
Logical indicating whether the factor loadings are to be stored (defaults to |
psi.switch |
Logical indicating whether the uniquenesses are to be stored (defaults to |
pi.switch |
Logical indicating whether the mixing proportions are to be stored (defaults to |
update.mu |
Logical indicating whether the means should be updated. Only relevant for the |
... |
Catches unused arguments. |
Details
storeControl
is provided for assigning values for IMIFA models within mcmc_IMIFA
. It may be useful not to store certain parameters if memory is an issue (e.g. for large data sets or for a large number of MCMC iterations after burnin and thinning).
Value
A named vector in which the names are the names of the storage switches and the values are logicals indicating whether that parameter is to be stored. The list also contains as an attribute a logical for each switch indicating whether it was actually supplied (TRUE
) or the default was accepted (FALSE
).
Note
Posterior inference and plotting won't be possible for parameters not stored.
Non-storage of parameters will almost surely prohibit the computation of posterior predictive checking error metrics within get_IMIFA_results
also. In particular, if such error metrics are desired, psi.switch
must be TRUE
for all but the "FA"
and "IFA"
models, mu.switch
must be TRUE
except in situations where update.mu=FALSE
is allowed, load.switch
must be TRUE
for all but the entirely zero-factor models, and pi.switch
must be TRUE
for models with clustering structure and unequal mixing proportions for all but the PPRE metric. score.switch=TRUE
is not required for any posterior predictive checking.
If post-hoc adaptation is invoked in get_IMIFA_results
on a model fitted without adaptation, store.switch=TRUE
is not required unless active.crit="SC"
(for "IFA"
models only); load.switch=TRUE
is required for both active.crit="BD"
and active.crit="SC"
.
Finally, if loadings are not stored but scores are, caution is advised when examining posterior scores as Procrustes rotation will not occur within get_IMIFA_results
.
Author(s)
Keefe Murphy - <keefe.murphy@mu.ie>
See Also
mcmc_IMIFA
, get_IMIFA_results
, mixfaControl
, mgpControl
, bnpControl
Examples
stctrl <- storeControl(score.switch=FALSE)
# data(olive)
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, storage=stctrl)
# Alternatively specify these arguments directly
# sim <- mcmc_IMIFA(olive, "IMIFA", n.iters=5000, score.switch=FALSE)