Type: | Package |
Title: | Bayesian Nonparametric Mixture Models |
Version: | 1.0.2 |
Date: | 2022-07-15 |
Author: | Riccardo Corradin [aut, cre], Antonio Canale [ctb], Bernardo Nipoti [ctb] |
Maintainer: | Riccardo Corradin <riccardo.corradin@gmail.com> |
Description: | Functions to perform Bayesian nonparametric univariate and multivariate density estimation and clustering, by means of Pitman-Yor mixtures, and dependent Dirichlet process mixtures for partially exchangeable data. See Corradin et al. (2021) <doi:10.18637/jss.v100.i15> for more details. |
License: | LGPL-3 | file LICENSE |
NeedsCompilation: | yes |
Imports: | methods, stats, ggplot2, coda, Rcpp, ggpubr |
Depends: | R (≥ 3.5.0) |
LinkingTo: | RcppArmadillo, Rcpp(≥ 0.12.13), RcppDist |
Suggests: | R.rsp |
VignetteBuilder: | R.rsp |
RoxygenNote: | 7.1.2 |
Encoding: | UTF-8 |
Packaged: | 2022-07-15 14:44:08 UTC; pmzrc1 |
Repository: | CRAN |
Date/Publication: | 2022-07-15 22:50:02 UTC |
BNPmix: Bayesian Nonparametric Mixture Models
Description
Functions to perform Bayesian nonparametric univariate and multivariate density estimation and clustering, by means of Pitman-Yor mixtures, and dependent Dirichlet process mixtures for partially exchangeable data. See Corradin et al. (2021) <doi:10.18637/jss.v100.i15> for more details.
BNPdens class constructor
Description
A constructor for the BNPdens
class. The class BNPdens
is a named list containing
the output generated by a specified Bayesian nonparametric mixture model implemented by means of
a specified MCMC strategy, as in PYdensity
, DDPdensity
, and PYregression
.
Usage
BNPdens(
density = NULL,
data = NULL,
grideval = NULL,
grid_x = NULL,
grid_y = NULL,
clust = NULL,
mean = NULL,
beta = NULL,
sigma2 = NULL,
probs = NULL,
niter = NULL,
nburn = NULL,
tot_time = NULL,
univariate = TRUE,
regression = FALSE,
dep = FALSE,
group_log = NULL,
group = NULL,
wvals = NULL
)
Arguments
density |
a matrix containing the values taken by the density at the grid points; |
data |
a dataset; |
grideval |
a set of values where to evaluate the density; |
grid_x |
regression grid, independent variable; |
grid_y |
regression grid, dependent variable; |
clust |
a ( |
mean |
values for the location parameters; |
beta |
coefficients for regression model (only for |
sigma2 |
values of the scale parameters; |
probs |
values for the mixture weights; |
niter |
number of MCMC iterations; |
nburn |
number of MCMC iterations to discard as burn-in; |
tot_time |
total execution time; |
univariate |
logical, |
regression |
logical, |
dep |
logical, |
group_log |
group allocation for each iteration (only for |
group |
vector, allocation of observations to strata (only for |
wvals |
values of the processes weights (only for |
Examples
data_toy <- c(rnorm(100, -3, 1), rnorm(100, 3, 1))
grid <- seq(-7, 7, length.out = 50)
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 100,
nburn = 10, nupd = 100), output = list(grid = grid))
str(est_model)
class(est_model)
set generic
Description
set generic
Usage
BNPdens2coda(object, dens)
Export to coda interface
Description
The method BNPdens2coda
converts a BNPdens
object into a coda
mcmc object.
Usage
## S3 method for class 'BNPdens'
BNPdens2coda(object, dens = FALSE)
Arguments
object |
a BNPdens object; |
dens |
logical, it can be TRUE only for models estimated with |
Value
an mcmc object
Examples
data_toy <- cbind(c(rnorm(100, -3, 1), rnorm(100, 3, 1)),
c(rnorm(100, -3, 1), rnorm(100, 3, 1)))
grid <- expand.grid(seq(-7, 7, length.out = 50),
seq(-7, 7, length.out = 50))
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 200, nburn = 100),
output = list(grid = grid))
coda_mcmc <- BNPdens2coda(est_model)
class(coda_mcmc)
C++ function - compute the Binder distances
Description
C++ function - compute the Binder distances
Arguments
M |
a matrix (r x n), r number of replications, n number of observations |
psm_mat |
a posterior similarity matrix |
Examples
{
M <- matrix(c(1,1,1,2,1,1,2,2,1,1,2,1,1,1,1,2), ncol = 4)
psmM <- BNPmix_psm(M)
BNPmix_BIN(M, psmM)
}
C++ function - compute the VI lower bound
Description
C++ function - compute the VI lower bound
Arguments
M |
a matrix (r x n), r number of replications, n number of observations |
psm_mat |
a posterior similarity matrix |
Examples
{
M <- matrix(c(1,1,1,2,1,1,2,2,1,1,2,1,1,1,1,2), ncol = 4)
psmM <- BNPmix_psm(M)
BNPmix_VI_LB(M, psmM)
}
C++ function - compute the posterior similarity matrix
Description
C++ function - compute the posterior similarity matrix
Arguments
M |
a matrix (r x n), r number of replications, n number of observations |
Examples
{
M <- matrix(c(1,1,1,2,1,1,2,2,1,1,2,1,1,2,1,1), ncol = 4)
BNPmix_psm(M)
}
BNPpart class constructor
Description
A constructor for the BNPpart
class. The class BNPpart
is a named list containing
the output of partition estimation methods.
Usage
BNPpart(partitions = NULL, scores = NULL, psm = NULL)
Arguments
partitions |
a matrix, each row is a visited partition; |
scores |
a vector, each value is the score of a visited partition; |
psm |
a matrix, posterior similarity matrix. |
Examples
data_toy <- c(rnorm(100, -3, 1), rnorm(100, 3, 1))
grid <- seq(-7, 7, length.out = 50)
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 100,
nburn = 10, nupd = 100), output = list(grid = grid))
part <- partition(est_model)
class(part)
Collaborative Perinatal Project data
Description
A subset of the Collaborative Perinatal Project data set (Klebanoff, 2009) focused on studying the effect of DDE exposure on pregnancies (Longnecker et al., 2001). The dataset contains the following variables for each pregnant women enrolled in the study:
hosp, factor denoting the hospital where the woman was hospitalized;
smoke, factor. It takes value 2 if the woman is a smoker, 1 otherwise;
gest, gestational age (in weeks);
dde, Dichlorodiphenyldichloroethylene (DDE) concentration in maternal serum;
weight, body weight of the baby at birth (in grams);
Usage
data(CPP)
Format
A data.frame
References
Klebanoff M. A. (2009) The collaborative perinatal project: a 50-year retrospective. Paediatric and perinatal epidemiology, 23, 2.
Longnecker, M. P., Klebanof, M. A., Zhou, H., Brock, J. (2001) Association between maternal serum concentration of the DDT metabolite DDE and preterm and small-for-gestational-age babies at birth. The Lancet, 358, 110-114.
Examples
data(CPP)
str(CPP)
MCMC for GM-dependent Dirichlet process mixtures of Gaussians
Description
The DDPdensity
function generates posterior density samples for a univariate Griffiths-Milne dependent Dirichlet process mixture model with Gaussian
kernel, for partially exchangeable data. The function implements the importance conditional sampler method.
Usage
DDPdensity(y, group, mcmc = list(), prior = list(), output = list())
Arguments
y |
a vector or matrix giving the data based on which densities are to be estimated; |
group |
vector of length |
mcmc |
list of MCMC arguments:
|
prior |
a list giving the prior information, which contains:
|
output |
a list of arguments for generating posterior output. It contains:
|
Details
This function fits a Griffiths-Milne dependent Dirichlet process (GM-DDP) mixture
for density estimation for partially exchangeable data (Lijoi et al., 2014).
For each observation the group
variable allows the observations to be gathered
into L
=length(unique(group))
distinct groups.
The model assumes exchangeability within each group, with observations in the l
th group marginally
modelled by a location-scale Dirichlet process mixtures, i.e.
\tilde f_l(y) = \int \phi(y; \mu, \sigma^2) \tilde p_l (d \mu, d \sigma^2)
where each \tilde p_l
is a Dirichlet process with total mass strength
and base measure P_0
.
The vector \tilde p = (\tilde p_1,\ldots,\tilde p_L)
is assumed to be jointly distributed as a vector of
GM-DDP(strength
, wei
; P_0
), where strength
and
P_0
are the total mass parameter and the base measure of each \tilde p_l
, and wei
controls the dependence across the components of
\tilde p
. Admissible values for wei
are in (0,1)
, with the two extremes of the range
corresponding to full exchangeability (wei
\rightarrow 0
)
and independence across groups (wei
\rightarrow 1
).
P_0
is a normal-inverse gamma base measure, i.e.
P_0(d\mu,d\sigma^2) = N(d \mu; m_0, \sigma^2 / k_0) \times IGa(d \sigma^2; a_0, b_0).
Posterior sampling is obtained by implementing the importance conditional sampler (Canale et al., 2019). See Corradin et al. (to appear) for more details.
Value
A BNPdensity
class object containing the estimated densities for each iteration,
the allocations for each iteration; the grid used to evaluate the densities (for each group); the
densities sampled from the posterior distribution (for each group); the groups; the weights of the processes.
The function returns also informations regarding the estimation: the number of iterations, the number
of burn-in iterations and the execution time.
References
Lijoi, A., Nipoti, B., and Pruenster, I. (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli 20, 1260–1291, doi:10.3150/13-BEJ521
Canale, A., Corradin, R., & Nipoti, B. (2019). Importance conditional sampling for Bayesian nonparametric mixtures. arXiv preprint arXiv:1906.08147
Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, doi:10.18637/jss.v100.i15
Examples
data_toy <- c(rnorm(50, -4, 1), rnorm(100, 0, 1), rnorm(50, 4, 1))
group_toy <- c(rep(1,100), rep(2,100))
grid <- seq(-7, 7, length.out = 50)
est_model <- DDPdensity(y = data_toy, group = group_toy,
mcmc = list(niter = 200, nburn = 100, var_MH_step = 0.25),
output = list(grid = grid))
summary(est_model)
plot(est_model)
C++ function to estimate Pitman-Yor univariate mixtures via marginal sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor univariate mixtures via marginal sampler - LOCATION SCALE
Arguments
data |
a vector of observations |
grid |
vector to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
a0 |
parameter of scale component |
b0 |
parameter of scale component |
m1 |
mean of hyperdistribution of m0 |
s21 |
variance of hyperdistribution of m0 |
tau1 |
shape parameter of hyperdistribution of k0 |
tau2 |
rate parameter of hyperdistribution of k0 |
a1 |
shape parameter of hyperdistribution of b0 |
b1 |
rate parameter of hyperdistribution of b0 |
mass |
parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
process |
if 0 DP, if 1 PY |
sigma_PY |
discount parameter |
print_message |
print the status |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor univariate mixtures via marginal sampler - LOCATION
Description
C++ function to estimate Pitman-Yor univariate mixtures via marginal sampler - LOCATION
Arguments
data |
a vector of observations |
grid |
vector to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
s20 |
variance of location component |
a0 |
parameter of scale component |
b0 |
parameter of scale component |
m1 |
hyperparameter, mean of distribution of m0 |
k1 |
hyperparameter, scale factor of distribution of m0 |
a1 |
hyperparameter, shape of distribution of s20 |
b1 |
hyperparameter, scale of distribution of s20 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
process |
if 0 DP, if 1 PY |
sigma_PY |
discount parameter |
print_message |
print the status |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - LOCATION SCALE
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
mean of hyperprior distribution of m0 |
S1 |
covariance of hyperprior distribution of m0 |
tau1 |
shape parameter of hyperprior distribution of k0 |
tau2 |
rate parameter of hyperprior distribution of k0 |
theta1 |
df of hyperprior distribution of S0 |
Theta1 |
matrix of hyperprior distribution of S0 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - LOCATION
Description
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - LOCATION
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
S20 |
variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
mean of hyperdistribtion of m0 |
k1 |
scale factor of hyperdistribtion of m0 |
theta1 |
df of hyperdistribtion of S20 |
Theta1 |
matrix of hyperdistribution of S20 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
tau1 |
shape parameters of hyperdistribution of k0 |
tau2 |
rate parameters of hyperdistribution of k0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor multivariate mixtures via marginal sampler - LOCATION SCALE
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
tau1 |
shape parameters of hyperdistribution of k0 |
tau2 |
rate parameters of hyperdistribution of k0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
Pitman-Yor prior elicitation
Description
The function PYcalibrate
elicits the strength parameter of the Pitman-Yor
process, given the discount parameter and the prior expected number of clusters.
Usage
PYcalibrate(Ek, n, discount = 0)
Arguments
Ek |
prior expected number of cluster; |
n |
sample size; |
discount |
discount parameter; default is set equal to 0, corresponding to a Dirichlet process prior. |
Value
A named list containingtthe values of strength and discount parameters.
Examples
PYcalibrate(5, 100)
PYcalibrate(5, 100, 0.5)
MCMC for Pitman-Yor mixtures of Gaussians
Description
The PYdensity
function generates a posterior density sample for a selection of univariate and multivariate Pitman-Yor
process mixture models with Gaussian kernels. See details below for the description of the different specifications of the implemented models.
Usage
PYdensity(y, mcmc = list(), prior = list(), output = list())
Arguments
y |
a vector or matrix giving the data based on which the density is to be estimated; |
mcmc |
a list of MCMC arguments:
|
prior |
a list giving the prior information. The list includes
|
output |
a list of arguments for generating posterior output. It contains:
|
Details
This generic function fits a Pitman-Yor process mixture model for density estimation and clustering. The general model is
\tilde f(y) = \int K(y; \theta) \tilde p (d \theta),
where K(y; \theta)
is a kernel density with parameter
\theta\in\Theta
. Univariate and multivariate Gaussian kernels are implemented with different specifications for the parametric space
\Theta
, as described below.
The mixing measure \tilde p
has a Pitman-Yor process prior with strength parameter \vartheta
,
discount parameter \alpha
, and base measure P_0
admitting the specifications presented below. For posterior sampling,
three MCMC approaches are implemented. See details below.
Univariate data
For univariate y
the function implements both a location and location-scale mixture model. The former assumes
\tilde f(y) = \int \phi(y; \mu, \sigma^2) \tilde p (d \mu) \pi(\sigma^2),
where
\phi(y; \mu, \sigma^2)
is a univariate Gaussian kernel function with mean \mu
and variance \sigma^2
,
and \pi(\sigma^2)
is an inverse gamma prior. The base measure is specified as
P_0(d \mu) = N(d \mu; m_0, \sigma^2_0),
and \sigma^2 \sim IGa(a_0, b_0)
.
Optional hyperpriors for the base measure's parameters are
(m_0,\sigma^2_0) \sim N(m_1, \sigma^2_0 / k_1) \times IGa(a_1, b_1).
The location-scale mixture model, instead, assumes
\tilde f(y) = \int \phi(y; \mu, \sigma^2) \tilde p (d \mu, d \sigma^2)
with normal-inverse gamma base measure
P_0 (d \mu, d \sigma^2) = N(d \mu; m_0, \sigma^2 / k_0) \times IGa(d \sigma^2; a_0, b_0).
and (optional) hyperpriors
m_0 \sim N(m_1, \sigma_1^2 ),\quad k_0 \sim Ga(\tau_1, \zeta_1),\quad b_0 \sim Ga(a_1, b_1).
Multivariate data
For multivariate y
(p
-variate) the function implements a location mixture model (with full covariance matrix) and two
different location-scale mixture models, with either full or diagonal covariance matrix. The location mixture model assumes
\tilde f(y) = \int \phi_p(y; \mu, \Sigma) \tilde p (d \mu) \pi(\Sigma)
where
\phi_p(y; \mu, \Sigma)
is a p
-dimensional Gaussian kernel function with mean vector \mu
and covariance matrix
\Sigma
. The prior on \Sigma
is inverse Whishart with parameters \Sigma_0
and \nu_0
, while the
base measure is
P_0(d \mu) = N(d \mu; m_0, S_0),
with optional hyperpriors
m_0 \sim N(m_1, S_0 / k_1),\quad S_0 \sim IW(\lambda_1, \Lambda_1).
The location-scale mixture model assumes
\tilde f(x) = \int \phi_p(y; \mu, \Sigma) \tilde p (d \mu, d \Sigma).
Two possible structures for \Sigma
are implemented, namely full and diagonal covariance. For the full covariance mixture model, the base measure is
the normal-inverse Wishart
P_0 (d \mu, d \Sigma) = N(d \mu; m_0, \Sigma / k_0) \times IW(d \Sigma; \nu_0, \Sigma_0),
with optional hyperpriors
m_0 \sim N(m_1, S_1),\quad k_0 \sim Ga(\tau_1, \zeta_1),\quad b_0 \sim W(\nu_1, \Sigma_1).
The second location-scale mixture model assumes a diagonal covariance structure. This is equivalent to write the mixture model as a mixture of products of univariate normal kernels, i.e.
\tilde f(y) = \int \prod_{r=1}^p \phi(y_r; \mu_r, \sigma^2_r) \tilde p (d \mu_1,\ldots,d \mu_p, d \sigma_1^2,\ldots,d \sigma_p^2).
For this specification, the base measure is assumed defined as the product of p
independent normal-inverse gamma distributions, that is
P_0 = \prod_{r=1}^p P_{0r}
where
P_{0r}(d \mu_r,d \sigma_r^2) = N(d \mu_r; m_{0r}, \sigma^2_r / k_{0r}) \times Ga(d \sigma^2_r; a_{0r}, b_{0r}).
Optional hyperpriors can be added, and, for each component, correspond to the set of hyperpriors considered for the univariate location-scale mixture model.
Posterior simulation methods
This generic function implements three types of MCMC algorithms for posterior simulation.
The default method is the importance conditional sampler 'ICS'
(Canale et al. 2019). Other options are
the marginal sampler 'MAR'
(Neal, 2000) and the slice sampler 'SLI'
(Kalli et al. 2011).
The importance conditional sampler performs an importance sampling step when updating the values of
individual parameters \theta
, which requires to sample m_imp
values from a suitable
proposal. Large values of m_imp
are known to improve the mixing of the chain
at the cost of increased running time (Canale et al. 2019). Two options are available for the slice sampler,
namely the dependent slice-efficient sampler (slice_type = 'DEP'
), which is set as default, and the
independent slice-efficient sampler (slice_type = 'INDEP'
) (Kalli et al. 2011). See Corradin et al. (to appear)
for more details.
Value
A BNPdens
class object containing the estimated density and
the cluster allocations for each iterations. If out_param = TRUE
the output
contains also the kernel specific parameters for each iteration. If mcmc_dens = TRUE
the output
contains also a realization from the posterior density for each iteration. IF mean_dens = TRUE
the output contains just the mean of the realizations from the posterior density. The output contains
also informations as the number of iterations, the number of burn-in iterations, the used
computational time and the type of estimated model (univariate = TRUE
or FALSE
).
References
Canale, A., Corradin, R., Nipoti, B. (2019), Importance conditional sampling for Bayesian nonparametric mixtures, arXiv preprint, arXiv:1906.08147
Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, 100, doi:10.18637/jss.v100.i15
Kalli, M., Griffin, J. E., and Walker, S. G. (2011), Slice sampling mixture models. Statistics and Computing 21, 93-105, doi:10.1007/s11222-009-9150-y
Neal, R. M. (2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics 9, 249-265, doi:10.2307/1390653
Examples
data_toy <- cbind(c(rnorm(100, -3, 1), rnorm(100, 3, 1)),
c(rnorm(100, -3, 1), rnorm(100, 3, 1)))
grid <- expand.grid(seq(-7, 7, length.out = 50),
seq(-7, 7, length.out = 50))
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 200, nburn = 100),
output = list(grid = grid))
summary(est_model)
plot(est_model)
MCMC for Pitman-Yor mixture of Gaussian regressions
Description
The PYregression
function generates a posterior sample
for mixtures of linear regression models inspired by the ANOVA-DDP model
introduced in De Iorio et al. (2004). See details below for model specification.
Usage
PYregression(y, x, mcmc = list(), prior = list(), output = list())
Arguments
y |
a vector of observations, univariate dependent variable; |
x |
a matrix of observations, multivariate independent variable; |
mcmc |
a list of MCMC arguments:
|
prior |
a list giving the prior information. The list includes
|
output |
list of posterior summaries:
|
Details
This function fits a Pitman-Yor process mixture of Gaussian linear regression models, i.e
\tilde f(y) = \int \phi(y; x^T \beta, \sigma^2) \tilde p (d \beta, d \sigma^2)
where x
is a bivariate vector containing the dependent variable in x
and a value of 1
for the intercept term.
The mixing measure \tilde p
has a Pitman-Yor process prior with strength \vartheta
,
discount parameter \alpha
. The location model assume a base measures P_0
specified as
P_0(d \beta) = N(d \beta; m_0, S_0) .
while the location-scale model assume a base measures P_0
specified as
P_0(d \beta, d \sigma^2) = N(d \beta; m_0, S_0) \times IGa(d \sigma^2; a_0, b_0).
Optional hyperpriors complete the model specification:
m_0 \sim N(m_1, S_0 / k_1 ),\quad S_0 \sim IW(\nu_1, S_1),\quad b_0 \sim G(\tau_1, \zeta_1).
Posterior simulation methods
This generic function implements three types of MCMC algorithms for posterior simulation.
The default method is the importance conditional sampler 'ICS'
(Canale et al. 2019). Other options are
the marginal sampler 'MAR'
(algorithm 8 of Neal, 2000) and the slice sampler 'SLI'
(Kalli et al. 2011).
The importance conditional sampler performs an importance sampling step when updating the values of
individual parameters \theta
, which requires to sample m_imp
values from a suitable
proposal. Large values of m_imp
are known to improve the mixing of the posterior distribution
at the cost of increased running time (Canale et al. 2019). When updateing the individual parameter
\theta
, Algorithm 8 of Neal, 2000, requires to sample m_marginal
values from the base
measure. m_marginal
can be chosen arbitrarily. Two options are available for the slice sampler,
namely the dependent slice-efficient sampler (slice_type = 'DEP'
), which is set as default, and the
independent slice-efficient sampler (slice_type = 'INDEP'
) (Kalli et al. 2011). See Corradin et al. (to appear)
for more details.
Value
A BNPdens
class object containing the estimated density and
the cluster allocations for each iterations. The output contains also the data and
the grids. If out_param = TRUE
the output
contains also the kernel specific parameters for each iteration. If mcmc_dens = TRUE
, the
function returns also a realization from the posterior density for each iteration.
If mean_dens = TRUE
, the output contains just the mean of the densities sampled at each iteration.
The output retuns also the number of iterations,
the number of burn-in iterations, the computational time and the type of model.
References
Canale, A., Corradin, R., Nipoti, B. (2019), Importance conditional sampling for Bayesian nonparametric mixtures, arXiv preprint, arXiv:1906.08147
Corradin, R., Canale, A., Nipoti, B. (2021), BNPmix: An R Package for Bayesian Nonparametric Modeling via Pitman-Yor Mixtures, Journal of Statistical Software, doi:10.18637/jss.v100.i15
De Iorio, M., Mueller, P., Rosner, G.L., and MacEachern, S. (2004), An ANOVA Model for Dependent Random Measures, Journal of the American Statistical Association 99, 205-215, doi:10.1198/016214504000000205
Kalli, M., Griffin, J. E., and Walker, S. G. (2011), Slice sampling mixture models. Statistics and Computing 21, 93-105, doi:10.1007/s11222-009-9150-y
Neal, R. M. (2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics 9, 249-265, doi:10.2307/1390653
Examples
x_toy <- c(rnorm(100, 3, 1), rnorm(100, 3, 1))
y_toy <- c(x_toy[1:100] * 2 + 1, x_toy[101:200] * 6 + 1) + rnorm(200, 0, 1)
grid_x <- c(0, 1, 2, 3, 4, 5)
grid_y <- seq(0, 35, length.out = 50)
est_model <- PYregression(y = y_toy, x = x_toy,
mcmc = list(niter = 200, nburn = 100),
output = list(grid_x = grid_x, grid_y = grid_y))
summary(est_model)
plot(est_model)
C++ function to estimate DDP models with 1 grouping variables
Description
C++ function to estimate DDP models with 1 grouping variables
Arguments
data |
a vector of observations. |
group |
group allocation of the data. |
ngr |
number of groups. |
grid |
vector to evaluate the density. |
niter |
number of iterations. |
nburn |
number of burn-in iterations. |
m0 |
expectation of location component. |
k0 |
tuning parameter of variance of location component. |
a0 |
parameter of scale component. |
b0 |
parameter of scale component. |
mass |
mass of Dirichlet process. |
wei |
prior weight of the specific processes. |
b |
tuning parameter of weights distribution |
napprox |
number of approximating values. |
n_approx_unif |
number of approximating values of the importance step for the weights updating. |
nupd |
number of iterations to show current updating. |
out_dens |
if TRUE, return also the estimated density (default TRUE). |
print_message |
print the status. |
light_dens |
if TRUE return only the posterior mean of the density |
C++ function to estimate Pitman-Yor univariate mixtures via importance conditional sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor univariate mixtures via importance conditional sampler - LOCATION SCALE
Arguments
data |
a vector of observations |
grid |
vector to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
a0 |
parameter of scale component |
b0 |
parameter of scale component |
m1 |
mean of hyperdistribution of m0 |
s21 |
variance of hyperdistribution of m0 |
tau1 |
shape parameter of hyperdistribution of k0 |
tau2 |
rate parameter of hyperdistribution of k0 |
a1 |
shape parameter of hyperdistribution of b0 |
b1 |
rate parameter of hyperdistribution of b0 |
strength |
parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
discount parameter |
print_message |
print the status |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor univariate mixtures via importance conditional sampler - LOCATION
Description
C++ function to estimate Pitman-Yor univariate mixtures via importance conditional sampler - LOCATION
Arguments
data |
a vector of observations |
grid |
vector to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
s20 |
variance of location component |
a0 |
parameter of scale component |
b0 |
parameter of scale component |
m1 |
hyperparameter, mean of distribution of m0 |
k1 |
hyperparameter, scale factor of distribution of m0 |
a1 |
hyperparameter, shape of distribution of s20 |
b1 |
hyperparameter, scale of distribution of s20 |
strength |
parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
discount parameter |
print_message |
print the status |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - LOCATION SCALE
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
mean of hyperprior distribution of m0 |
S1 |
covariance of hyperprior distribution of m0 |
tau1 |
shape parameter of hyperprior distribution of k0 |
tau2 |
rate parameter of hyperprior distribution of k0 |
theta1 |
df of hyperprior distribution of S0 |
Theta1 |
matrix of hyperprior distribution of S0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler
Description
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
S20 |
variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
mean of hyperdistribtion of m0 |
k1 |
scale factor of hyperdistribtion of m0 |
theta1 |
df of hyperdistribtion of S20 |
Theta1 |
matrix of hyperdistribution of S20 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
tau1 |
shape parameters of hyperdistribution of k0 |
tau2 |
rate parameters of hyperdistribution of k0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via importance conditional sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
tau1 |
shape parameters of hyperdistribution of k0 |
tau2 |
rate parameters of hyperdistribution of k0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor univariate mixtures via slice sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor univariate mixtures via slice sampler - LOCATION SCALE
Arguments
data |
a vector of observations |
grid |
vector to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
a0 |
parameter of scale component |
b0 |
parameter of scale component |
m1 |
mean of hyperdistribution of m0 |
s21 |
variance of hyperdistribution of m0 |
tau1 |
shape parameter of hyperdistribution of k0 |
tau2 |
rate parameter of hyperdistribution of k0 |
a1 |
shape parameter of hyperdistribution of b0 |
b1 |
rate parameter of hyperdistribution of b0 |
mass |
parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
hyper |
if TRUE use hyperpriors, default TRUE |
indep |
if TRUE use the independent slice efficient |
C++ function to estimate Pitman-Yor univariate mixtures via slice sampler - LOCATION
Description
C++ function to estimate Pitman-Yor univariate mixtures via slice sampler - LOCATION
Arguments
data |
a vector of observations |
grid |
vector to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
s20 |
variance of location component |
a0 |
parameter of scale component |
b0 |
parameter of scale component |
m1 |
hyperparameter, mean of distribution of m0 |
k1 |
hyperparameter, scale factor of distribution of m0 |
a1 |
hyperparameter, shape of distribution of s20 |
b1 |
hyperparameter, rate of distribution of s20 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - LOCATION SCALE
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
mean of hyperprior distribution of m0 |
S1 |
covariance of hyperprior distribution of m0 |
tau1 |
shape parameter of hyperprior distribution of k0 |
tau2 |
rate parameter of hyperprior distribution of k0 |
theta1 |
df of hyperprior distribution of S0 |
Theta1 |
matrix of hyperprior distribution of S0 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
indep |
if TRUE use the independent slice efficient |
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - LOCATION
Description
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - LOCATION
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
S20 |
variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
mean of hyperdistribtion of m0 |
k1 |
scale factor of hyperdistribtion of m0 |
theta1 |
df of hyperdistribtion of S20 |
Theta1 |
matrix of hyperdistribution of S20 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
indep |
if TRUE use independent slice efficient |
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
tau1 |
shape parameters of hyperdistribution of k0 |
tau2 |
rate parameters of hyperdistribution of k0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - PRODUCT KERNEL
Description
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - PRODUCT KERNEL
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
vector, scale parameters for the location component |
a0 |
vector, parameters of scale component |
b0 |
vector, parameters of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
strength |
strength parameter |
napprox |
number of approximating values |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
discount |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - LOCATION SCALE
Description
C++ function to estimate Pitman-Yor multivariate mixtures via slice sampler - LOCATION SCALE
Arguments
data |
a matrix of observations |
grid |
matrix of points to evaluate the density |
niter |
number of iterations |
nburn |
number of burn-in iterations |
m0 |
expectation of location component |
k0 |
tuning parameter of variance of location component |
S0 |
parameter of scale component |
n0 |
parameter of scale component |
m1 |
means of hyperdistribution of m0 |
s21 |
variances of hyperdistribution of m0 |
tau1 |
shape parameters of hyperdistribution of k0 |
tau2 |
rate parameters of hyperdistribution of k0 |
a1 |
shape parameters of hyperdistribution of b0 |
b1 |
rate parameters of hyperdistribution of b0 |
mass |
mass parameter |
nupd |
number of iterations to show current updating |
out_param |
if TRUE, return also the location and scale paramteres lists |
out_dens |
if TRUE, return also the estimated density (default TRUE) |
sigma_PY |
second parameter of PY |
print_message |
print the status |
light_dens |
if TRUE return only the posterior mean of the density |
hyper |
if TRUE use hyperpriors, default TRUE |
indep |
if TRUE use the independent slice efficient |
C++ function - clean the partition matrix
Description
C++ function - clean the partition matrix
Arguments
M |
a matrix (r x n), r number of replications, n number of observations |
Examples
{
M <- matrix(c(1,1,1,3,1,1,4,4,1,1,3,1,1,3,1,1), ncol = 4)
clean_partition(M)
}
set generic
Description
set generic
Usage
dBNPdens(object, x)
Evaluate estimated univariate densities at a given point
Description
The method dBNPdens
provides an approximated evaluation of estimated univariate densities at a given point, for a BNPdens
class object.
Usage
## S3 method for class 'BNPdens'
dBNPdens(object, x)
Arguments
object |
a |
x |
the point where to evaluate the density. |
Value
a numeric value
Examples
data_toy <- c(rnorm(100, -3, 1), rnorm(100, 3, 1))
grid <- seq(-7, 7, length.out = 50)
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 200, nburn = 100),
output = list(grid = grid))
x <- 1.4
dBNPdens(est_model, x)
set generic
Description
set generic
Usage
partition(object, ...)
Estimate the partition of the data
Description
The partition
method estimates the partition of the data based on the output generated by a Bayesian nonparametric mixture
model, according to a specified criterion, for a BNPdens
class object.
Usage
## S3 method for class 'BNPdens'
partition(object, dist = "VI", max_k = NULL, ...)
Arguments
object |
an object of class |
dist |
a loss function defined on the space of partitions;
it can be variation of information ( |
max_k |
maximum number of clusters passed to the |
... |
additional arguments to be passed. |
Details
This method returns point estimates for the clustering of the data induced by a nonparametric mixture model.
This result is achieved exploiting two different loss fuctions on the space of partitions: variation of information
(dist = 'VI'
) and Binder's loss (dist = 'Binder'
). The function is based on the mcclust.ext
code by Sara Wade (Wade and Ghahramani, 2018).
Value
The method returns a list containing a matrix with nrow(data)
columns and 3 rows. Each row reports
the cluster labels for each observation according to three different approaches, one per row. The first and second rows
are the output of an agglomerative clustering procedure obtained by applying the function hclust
to the dissimilarity matrix, and by using the complete or average linkage,
respectively. The number of clusters is between 1 and max_k
and is choosen according to a lower bound
on the expected loss, as described in Wade and Ghahramani (2018).
The third row reports the partition visited by the MCMC with the minimum distance dist
from the dissimilarity matrix.
In addition, the list reports a vector with three scores representing the lower bound on the expected loss for the three partitions.
References
Wade, S., Ghahramani, Z. (2018). Bayesian cluster analysis: Point estimation and credible balls. Bayesian Analysis, 13, 559-626.
Examples
data_toy <- c(rnorm(10, -3, 1), rnorm(10, 3, 1))
grid <- seq(-7, 7, length.out = 50)
fit <- PYdensity(y = data_toy, mcmc = list(niter = 100,
nburn = 10, nupd = 100), output = list(grid = grid))
class(fit)
partition(fit)
Density plot for BNPdens class
Description
Extension of the plot
method to the BNPdens
class. The method plot.BNPdens
returns suitable plots for a BNPdens
object. See details.
Usage
## S3 method for class 'BNPdens'
plot(
x,
dimension = c(1, 2),
col = "#0037c4",
show_points = F,
show_hist = F,
show_clust = F,
bin_size = NULL,
wrap_dim = NULL,
xlab = "",
ylab = "",
band = T,
conf_level = c(0.025, 0.975),
...
)
Arguments
x |
an object of class |
dimension |
if |
col |
the color of the lines; |
show_points |
if |
show_hist |
if |
show_clust |
if |
bin_size |
if |
wrap_dim |
bivariate vector, if |
xlab |
label of the horizontal axis; |
ylab |
label of the vertical axis; |
band |
if |
conf_level |
bivariate vector, order of the quantiles for the posterior credible bands. Default |
... |
additional arguments to be passed. |
Details
If the BNPdens
object is generated by PYdensity
, the function returns
the univariate or bivariate estimated density plot.
If the BNPdens
object is generated by PYregression
, the function returns
the scatterplot of the response variable jointly with the covariates (up to four), coloured according to the estimated partition.
up to four covariates.
If x
is a BNPdens
object generated by DDPdensity
, the function returns
a wrapped plot with one density per group.
The plots can be enhanced in several ways: for univariate densities, if show_hist = TRUE
,
the plot shows also the histogram of the data; if show_points = TRUE
,
the plot shows also the observed points along the
x-axis; if show_points = TRUE
and show_clust = TRUE
, the points are colored
according to the partition estimated with the partition
function.
For multivariate densities: if show_points = TRUE
,
the plot shows also the scatterplot of the data;
if show_points = TRUE
and show_clust = TRUE
,
the points are colored according to the estimated partition.
Value
A ggplot2
object.
Examples
# PYdensity example
data_toy <- c(rnorm(100, -3, 1), rnorm(100, 3, 1))
grid <- seq(-7, 7, length.out = 50)
est_model <- PYdensity(y = data_toy,
mcmc = list(niter = 200, nburn = 100, nupd = 100),
output = list(grid = grid))
class(est_model)
plot(est_model)
# PYregression example
x_toy <- c(rnorm(100, 3, 1), rnorm(100, 3, 1))
y_toy <- c(x_toy[1:100] * 2 + 1, x_toy[101:200] * 6 + 1) + rnorm(200, 0, 1)
grid_x <- c(0, 1, 2, 3, 4, 5)
grid_y <- seq(0, 35, length.out = 50)
est_model <- PYregression(y = y_toy, x = x_toy,
mcmc = list(niter = 200, nburn = 100),
output = list(grid_x = grid_x, grid_y = grid_y))
summary(est_model)
plot(est_model)
# DDPdensity example
data_toy <- c(rnorm(50, -4, 1), rnorm(100, 0, 1), rnorm(50, 4, 1))
group_toy <- c(rep(1,100), rep(2,100))
grid <- seq(-7, 7, length.out = 50)
est_model <- DDPdensity(y = data_toy, group = group_toy,
mcmc = list(niter = 200, nburn = 100, napprox_unif = 50),
output = list(grid = grid))
summary(est_model)
plot(est_model)
BNPdens print method
Description
The BNPdens
method prints the type of a BNPdens
object.
Usage
## S3 method for class 'BNPdens'
print(x, ...)
Arguments
x |
an object of class |
... |
additional arguments. |
Examples
data_toy <- c(rnorm(100, -3, 1), rnorm(100, 3, 1))
grid <- seq(-7, 7, length.out = 50)
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 100,
nburn = 10, napprox = 10), output = list(grid = grid))
class(est_model)
print(est_model)
BNPdens summary method
Description
The summary.BNPdens
method provides summary information on BNPdens
objects.
Usage
## S3 method for class 'BNPdens'
summary(object, ...)
Arguments
object |
an object of class |
... |
additional arguments |
Examples
data_toy <- c(rnorm(100, -3, 1), rnorm(100, 3, 1))
grid <- seq(-7, 7, length.out = 50)
est_model <- PYdensity(y = data_toy, mcmc = list(niter = 100,
nburn = 10, napprox = 10), output = list(grid = grid))
class(est_model)
summary(est_model)