Type: | Package |
Title: | A Bayesian Nonparametric Algorithm for Time Series Clustering |
Version: | 2.0 |
Date: | 2019-08-19 |
Author: | Martell-Juarez, D.A. & Nieto-Barajas, L.E. |
Maintainer: | David Alejandro Martell Juarez <alex91599@gmail.com> |
Depends: | R(≥ 3.6.0), mvtnorm, MASS |
Description: | Performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014). |
NeedsCompilation: | no |
Repository: | CRAN |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Packaged: | 2019-08-19 21:10:45 UTC; alex9 |
Date/Publication: | 2019-08-19 21:40:03 UTC |
A Bayesian Nonparametric Algorithm for Time Series Clustering
Description
This package performs the algorithm for time series clustering described in Nieto-Barajas and Contreras-Cristan (2014). The package contains functions to work with annual, monthly and quarterly time series data.
The main functions to accomplish the above are:
1) tseriesca
2) tseriescm
3) tseriescq
Details
Package: | BNPTSclust |
Type: | Package |
Version: | 2.0 |
Date: | 2019-08-19 |
License: | GPL2, GPL3 |
For a comprehensive guide on how to use the package, refer to the vignette attached to the package.
Author(s)
Martell-Juarez, D.A. and Nieto-Barajas, L.E.
Maintainer: David Alejandro Martell Juarez <alex91599@gmail.com>
References
Nieto-Barajas, L.E. and Contreras-Cristan, A. (2014) A Bayesian Nonparametric Approach for Time Series Clustering. Bayesian Analysis Vol. 9, No. 1 147–170.
Cluster groups plotting function.
Description
Function that plots the time series clusters generated by either of the functions: "tseriesca", "tseriescm" or "tseriescq".
Usage
clusterplots(L, data)
Arguments
L |
output list from the functions: "tseriesca", "tseriescm" or "tseriescq". |
data |
Data frame with the time series information. |
Details
See the examples in the documentation files of "tseriesca", "tseriescm" or "tseriescq" for an example of this function's usage.
Value
The function returns the plots of the time series clusters directly.
Author(s)
Martell-Juarez, D.A.
Univariate ties function
Description
Computes the distinct observations and frequencies in a numeric vector.
Usage
comp11(y)
Arguments
y |
Numeric vector. |
Details
The code of the function is the same as the "comp1" function from the "BNPdensity" package. The change is in the output of the function. This function is for internal use.
Value
jstar |
variable that rearranges "y" into a vector with its unique values. |
nstar |
frequency of each distinct observation in "y". |
rstar |
number of distinct observations in "y". |
gn |
variable that indicates the group number to which every entry in "y" belongs. |
Note
For internal use.
Author(s)
Martell-Juarez, D.A., Barrios, E., Nieto-Barajas, L. and Pruenster, I.
Function that creates the design matrices necessary for the clustering algorithm to work.
Description
Function that generates the design matrices of the clustering algorithm based on the parameters that the user wants to consider, i.e. level, polinomial trend and/or seasonal components. It also returns the number of parameters that are considered and not considered for clustering.
Usage
designmatrices(level, trend, seasonality, deg, T, n, fun)
Arguments
level |
Variable that indicates if the level of the time series will be considered for clustering. If level = 0, then it is omitted. If level = 1, then it is taken into account. |
trend |
Variable that indicates if the polinomial trend of the model will be considered for clustering. If trend = 0, then it is omitted. If trend = 1, then it is taken into account. |
seasonality |
Variable that indicates if the seasonal components of the model will be considered for clustering. If seasonality = 0, then they are omitted. If seasonality = 1, then they are taken into account. |
deg |
Degree of the polinomial trend of the model. |
T |
Number of periods of the time series. |
n |
Number of time series. |
fun |
Clustering function being used. |
Value
Z |
Design matrix of the parameters not considered for clustering. |
X |
Design matrix of the parameters considered for clustering. |
p |
Number of parameters not considered for clustering. |
d |
Number of parameters considered for clustering. |
Note
For internal use.
Author(s)
Martell-Juarez, D.A.
Diagnostic plots function.
Description
Function that produces the diagnostic plots to assess the convergence of the Markov Chains generated by either of the functions: "tseriesca", "tseriescm" or "tseriescq".
Usage
diagplots(L)
Arguments
L |
output list from the functions: "tseriesca", "tseriescm" or "tseriescq". |
Details
See the examples in the documentation files of "tseriesca", "tseriescm" or "tseriescq" for an example of this function's usage.
Value
The function returns three different kinds of plots to assess convergence of the generated Markov Chain: trace plots, histograms and ergodic mean plots.
Author(s)
Martell-Juarez, D.A.
GDP per person employed from 1990 to 2012
Description
This data set contains the yearly GDP per person employed from 1990 to 2012 for 121 countries.
Usage
data(gdp)
Format
Data frame with 20 rows and 121 columns.
Source
http://data.worldbank.org/indicator/SL.GDP.PCAP.EM.KD
House price statistics in Scotland from 2004 to 2014.
Description
This data set contains the average price of houses from the 1st quarter of 2004 to the 4th quarter of 2014 by the local authority areas of Scotland
Usage
data(houses)
Format
Data frame with 44 rows and 33 columns.
Source
http://www.ros.gov.uk/public/news/quarterly_statistics.html
References
http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
Scaling data function.
Description
This function scales the time series data in the interval [0,1] as deemed necessary in Nieto-Barajas and Contreras-Cristan (2014) for the time series clustering algorithm. It also obtains the time periods of the data set provided.
Usage
scaleandperiods(data,scale)
Arguments
data |
Data frame with the time series information. |
scale |
Flag that indicates if the time series data should be scaled to the [0,1] interval with a linear transformation as proposed by Nieto-Barajas and Contreras-Cristan (2014). If TRUE, then the time series are scaled to the [0,1] interval. Its value comes directly from the "scale" argument of the clustering functions. |
Details
The function considers that the time periods of the data appear as row names.
Value
periods |
array with the time periods of the data. |
mydata |
data frame with the time series data scaled in [0,1]. |
cts |
variable that indicates if some time series were removed because they were constant in time. If no time series were removed, cts = 0. If there were time series removed, cts indicates the column of such time series. |
Note
For internal use.
Author(s)
Martell-Juarez, D.A.
Mexican stock exchange market prices
Description
This data set contains the monthly adjusted closing prices of 58 shares of the mexican stock exchange market from September 2006 to August 2011.
Usage
data(stocks)
Format
Data frame with 60 rows and 58 columns.
Source
http://www.dowjones.com/factiva/
References
This is the data set used by Nieto-Barajas, L.E. & Contreras-Cristan, A. (2014) as application for their paper.
Function for annual time series clustering.
Description
Function that performs the time series clustering algorithm described in Nieto-Barajas and Contreras-Cristan (2014) for annual time series data.
Usage
tseriesca(data, maxiter = 500, burnin = floor(0.1 * maxiter),
thinning = 5, scale = TRUE, level = FALSE, trend = TRUE, deg = 2,
c0eps = 2, c1eps = 1, c0beta = 2, c1beta = 1, c0alpha = 2, c1alpha = 1,
priora = TRUE, pia = 0.5, q0a = 1, q1a = 1, priorb = TRUE, q0b = 1,
q1b = 1, a = 0.25, b = 0, indlpml = FALSE)
Arguments
data |
Data frame with the time series information. |
maxiter |
Maximum number of iterations for Gibbs sampling. |
burnin |
Burn-in period of the Markov Chain generated by Gibbs sampling. |
thinning |
Number that indicates how many Gibbs sampling simulations should be skipped to form the Markov Chain. |
scale |
Flag that indicates if the time series data should be scaled to the [0,1] interval with a linear transformation as proposed by Nieto-Barajas and Contreras-Cristan (2014). If TRUE, then the time series are scaled to the [0,1] interval. |
level |
Flag that indicates if the level of the time series will be considered for clustering. If TRUE, then it is taken into account. |
trend |
Flag that indicates if the polinomial trend of the model will be considered for clustering. If TRUE, then it is taken into account. |
deg |
Degree of the polinomial trend of the model. |
c0eps |
Shape parameter of the hyper-prior distribution on sig2eps. |
c1eps |
Rate parameter of the hyper-prior distribution on sig2eps. |
c0beta |
Shape parameter of the hyper-prior distribution on sig2beta. |
c1beta |
Rate parameter of the hyper-prior distribution on sig2beta. |
c0alpha |
Shape parameter of the hyper-prior distribution on sig2alpha. |
c1alpha |
Rate parameter of the hyper-prior distribution on sig2alpha. |
priora |
Flag that indicates if a prior on parameter "a" is to be assigned. If TRUE, a prior on "a" is assigned. |
pia |
Mixing proportion of the prior distribution on parameter "a". |
q0a |
Shape parameter of the continuous part of the prior distribution on parameter "a". |
q1a |
Shape parameter of the continuous part of the prior distribution on parameter "a". |
priorb |
Flag that indicates if a prior on parameter "b" is to be assigned. If TRUE, a prior on "b" is assigned. |
q0b |
Shape parameter of the prior distribution on parameter "b". |
q1b |
Shape parameter of the prior distribution on parameter "b". |
a |
Initial/fixed value of parameter "a". |
b |
Initial/fixed value of parameter "b". |
indlpml |
Flag that indicates if the LPML is to be calculated. If TRUE, LPML is calculated. |
Details
It is assumed that the time series data is organized into a data frame with the time periods included as its row names.
Value
mstar |
Number of groups of the chosen cluster configuration. |
gnstar |
Array that contains the group number to which each time series belongs. |
HM |
Heterogeneity Measure of the chosen cluster configuration. |
arrho |
Acceptance rate of the parameter "rho". |
ara |
Acceptance rate of the parameter "a". |
arb |
Acceptance rate of the parameter "b". |
sig2epssample |
Matrix that in its columns contains the sample of each sig2eps_i's posterior distribution after Gibbs sampling. |
sig2alphasample |
Matrix that in its columns contains the sample of each sig2alpha_i's posterior distribution after Gibbs sampling. |
sig2betasample |
Matrix that in its columns contains the sample of each sig2beta_i's posterior distribution after Gibbs sampling. |
sig2thesample |
Vector that contains the sample of sig2the's posterior distribution after Gibbs sampling. |
rhosample |
Vector that contains the sample of rho's posterior distribution after Gibbs sampling. |
asample |
Vector that contains the sample of a's posterior distribution after Gibbs sampling. |
bsample |
Vector that contains the sample of b's posterior distribution after Gibbs sampling. |
msample |
Vector that contains the sample of the number of groups at each Gibbs sampling iteration. |
lpml |
If indlpml = TRUE, lpml contains the value of the LPML of the chosen model. |
scale |
Flag that indicates if the time series data were scaled to the [0,1] interval with a linear transformation. This will be taken as an input for the plotting functions. |
Author(s)
Martell-Juarez, D.A. and Nieto-Barajas, L.E.
Examples
## Do not run
#
# data(gdp)
# tseriesca.out <- tseriesca(gdp,maxiter = 4000,level=FALSE,trend=TRUE,
# c0eps = 0.1,c1eps = 0.1,c0beta = 0.1,
# c1beta = 0.1,c0alpha = 0.1,
# c1alpha= 0.1)
# Make sure that chain convergence is always assessed. Run the following
# code to show the cluster and diagnostic plots:
data(gdp)
data(tseriesca.out)
attach(tseriesca.out)
clusterplots(tseriesca.out,gdp)
diagplots(tseriesca.out)
Output of tseriesca function for the GDP per person employed dataset
Description
This object contains the output of the function tseriesca for the example described in its documentation file.
Usage
data(tseriesca.out)
Details
See function tseriesca for an explanation of how the output was obtained.
Examples
data(tseriesca.out)
Function for monthly time series clustering.
Description
Function that performs the time series clustering algorithm described in Nieto-Barajas and Contreras-Cristan (2014) for monthly time series data.
Usage
tseriescm(data, maxiter = 500, burnin = floor(0.1 * maxiter),
thinning = 5, scale = TRUE, level = FALSE, trend = TRUE, seasonality = TRUE,
deg = 2, c0eps = 2, c1eps = 1, c0beta = 2, c1beta = 1, c0alpha = 2,
c1alpha = 1, priora = TRUE, pia = 0.5, q0a = 1, q1a = 1, priorb = TRUE,
q0b = 1, q1b = 1, a = 0.25, b = 0, indlpml = FALSE)
Arguments
data |
Data frame with the time series information. |
maxiter |
Maximum number of iterations for Gibbs sampling. |
burnin |
Burn-in period of the Markov Chain generated by Gibbs sampling. |
thinning |
Number that indicates how many Gibbs sampling simulations should be skipped to form the Markov Chain. |
scale |
Flag that indicates if the time series data should be scaled to the [0,1] interval with a linear transformation as proposed by Nieto-Barajas and Contreras-Cristan (2014). If TRUE, then the time series are scaled to the [0,1] interval. |
level |
Flag that indicates if the level of the time series will be considered for clustering. If TRUE, then it is taken into account. |
trend |
Flag that indicates if the polinomial trend of the model will be considered for clustering. If TRUE, then it is taken into account. |
seasonality |
Flag that indicates if the seasonal components of the model will be considered for clustering. If TRUE, then they are taken into account. |
deg |
Degree of the polinomial trend of the model. |
c0eps |
Shape parameter of the hyper-prior distribution on sig2eps. |
c1eps |
Rate parameter of the hyper-prior distribution on sig2eps. |
c0beta |
Shape parameter of the hyper-prior distribution on sig2beta. |
c1beta |
Rate parameter of the hyper-prior distribution on sig2beta. |
c0alpha |
Shape parameter of the hyper-prior distribution on sig2alpha. |
c1alpha |
Rate parameter of the hyper-prior distribution on sig2alpha. |
priora |
Flag that indicates if a prior on parameter "a" is to be assigned. If TRUE, a prior on "a" is assigned. |
pia |
Mixing proportion of the prior distribution on parameter "a". |
q0a |
Shape parameter of the continuous part of the prior distribution on parameter "a". |
q1a |
Shape parameter of the continuous part of the prior distribution on parameter "a". |
priorb |
Flag that indicates if a prior on parameter "b" is to be assigned. If TRUE, a prior on "b" is assigned. |
q0b |
Shape parameter of the prior distribution on parameter "b". |
q1b |
Shape parameter of the prior distribution on parameter "b". |
a |
Initial/fixed value of parameter "a". |
b |
Initial/fixed value of parameter "b". |
indlpml |
Flag that indicates if the LPML is to be calculated. If TRUE, LPML is calculated. |
Details
It is assumed that the time series data is organized into a data frame with the time periods included as its row names.
Value
mstar |
Number of groups of the chosen cluster configuration. |
gnstar |
Array that contains the group number to which each time series belongs. |
HM |
Heterogeneity Measure of the chosen cluster configuration. |
arrho |
Acceptance rate of the parameter "rho". |
ara |
Acceptance rate of the parameter "a". |
arb |
Acceptance rate of the parameter "b". |
sig2epssample |
Matrix that in its columns contains the sample of each sig2eps_i's posterior distribution after Gibbs sampling. |
sig2alphasample |
Matrix that in its columns contains the sample of each sig2alpha_i's posterior distribution after Gibbs sampling. |
sig2betasample |
Matrix that in its columns contains the sample of each sig2beta_i's posterior distribution after Gibbs sampling. |
sig2thesample |
Vector that contains the sample of sig2the's posterior distribution after Gibbs sampling. |
rhosample |
Vector that contains the sample of rho's posterior distribution after Gibbs sampling. |
asample |
Vector that contains the sample of a's posterior distribution after Gibbs sampling. |
bsample |
Vector that contains the sample of b's posterior distribution after Gibbs sampling. |
msample |
Vector that contains the sample of the number of groups at each Gibbs sampling iteration. |
lpml |
If indlpml = TRUE, lpml contains the value of the LPML of the chosen model. |
scale |
Flag that indicates if the time series data were scaled to the [0,1] interval with a linear transformation. This will be taken as an input for the plotting functions. |
Author(s)
Martell-Juarez, D.A. and Nieto-Barajas, L.E.
Examples
## Do not run
#
# data(stocks)
# tseriescm.out <- tseriescm(stocks,maxiter=4000,level=FALSE,trend=TRUE,
# seasonality=TRUE,priorb=FALSE,b=0)
#
# Make sure that chain convergence is always assessed. Run the following
# code to show the cluster and diagnostic plots:
data(stocks)
data(tseriescm.out)
attach(tseriescm.out)
clusterplots(tseriescm.out,stocks)
diagplots(tseriescm.out)
Output of tseriescm function for the Mexican stock exchange market prices dataset
Description
This object contains the output of the function tseriescm for the example described in its documentation file.
Usage
data(tseriescm.out)
Details
See function tseriescm for an explanation of how the output was obtained.
Examples
data(tseriescm.out)
Function for quarterly time series clustering.
Description
Function that performs the time series clustering algorithm described in Nieto-Barajas and Contreras-Cristan (2014) for quarterly time series data.
Usage
tseriescq(data, maxiter = 500, burnin = floor(0.1 * maxiter),
thinning = 5, scale = TRUE, level = FALSE, trend = TRUE, seasonality = TRUE,
deg = 2, c0eps = 2, c1eps = 1, c0beta = 2, c1beta = 1, c0alpha = 2,
c1alpha = 1, priora = TRUE, pia = 0.5, q0a = 1, q1a = 1, priorb = TRUE,
q0b = 1, q1b = 1, a = 0.25, b = 0, indlpml = FALSE)
Arguments
data |
Data frame with the time series information. |
maxiter |
Maximum number of iterations for Gibbs sampling. |
burnin |
Burn-in period of the Markov Chain generated by Gibbs sampling. |
thinning |
Number that indicates how many Gibbs sampling simulations should be skipped to form the Markov Chain. |
scale |
Flag that indicates if the time series data should be scaled to the [0,1] interval with a linear transformation as proposed by Nieto-Barajas and Contreras-Cristan (2014). If TRUE, then the time series are scaled to the [0,1] interval. |
level |
Flag that indicates if the level of the time series will be considered for clustering. If TRUE, then it is taken into account. |
trend |
Flag that indicates if the polinomial trend of the model will be considered for clustering. If TRUE, then it is taken into account. |
seasonality |
Flag that indicates if the seasonal components of the model will be considered for clustering. If TRUE, then they are taken into account. |
deg |
Degree of the polinomial trend of the model. |
c0eps |
Shape parameter of the hyper-prior distribution on sig2eps. |
c1eps |
Rate parameter of the hyper-prior distribution on sig2eps. |
c0beta |
Shape parameter of the hyper-prior distribution on sig2beta. |
c1beta |
Rate parameter of the hyper-prior distribution on sig2beta. |
c0alpha |
Shape parameter of the hyper-prior distribution on sig2alpha. |
c1alpha |
Rate parameter of the hyper-prior distribution on sig2alpha. |
priora |
Flag that indicates if a prior on parameter "a" is to be assigned. If TRUE, a prior on "a" is assigned. |
pia |
Mixing proportion of the prior distribution on parameter "a". |
q0a |
Shape parameter of the continuous part of the prior distribution on parameter "a". |
q1a |
Shape parameter of the continuous part of the prior distribution on parameter "a". |
priorb |
Flag that indicates if a prior on parameter "b" is to be assigned. If TRUE, a prior on "b" is assigned. |
q0b |
Shape parameter of the prior distribution on parameter "b". |
q1b |
Shape parameter of the prior distribution on parameter "b". |
a |
Initial/fixed value of parameter "a". |
b |
Initial/fixed value of parameter "b". |
indlpml |
Flag that indicates if the LPML is to be calculated. If TRUE, LPML is calculated. |
Details
It is assumed that the time series data is organized into a data frame with the time periods included as its row names.
Value
mstar |
Number of groups of the chosen cluster configuration. |
gnstar |
Array that contains the group number to which each time series belongs. |
HM |
Heterogeneity Measure of the chosen cluster configuration. |
arrho |
Acceptance rate of the parameter "rho". |
ara |
Acceptance rate of the parameter "a". |
arb |
Acceptance rate of the parameter "b". |
sig2epssample |
Matrix that in its columns contains the sample of each sig2eps_i's posterior distribution after Gibbs sampling. |
sig2alphasample |
Matrix that in its columns contains the sample of each sig2alpha_i's posterior distribution after Gibbs sampling. |
sig2betasample |
Matrix that in its columns contains the sample of each sig2beta_i's posterior distribution after Gibbs sampling. |
sig2thesample |
Vector that contains the sample of sig2the's posterior distribution after Gibbs sampling. |
rhosample |
Vector that contains the sample of rho's posterior distribution after Gibbs sampling. |
asample |
Vector that contains the sample of a's posterior distribution after Gibbs sampling. |
bsample |
Vector that contains the sample of b's posterior distribution after Gibbs sampling. |
msample |
Vector that contains the sample of the number of groups at each Gibbs sampling iteration. |
lpml |
If indlpml = TRUE, lpml contains the value of the LPML of the chosen model. |
scale |
Flag that indicates if the time series data were scaled to the [0,1] interval with a linear transformation. This will be taken as an input for the plotting functions. |
Author(s)
Martell-Juarez, D.A. and Nieto-Barajas, L.E.
Examples
## Do not run
#
# data(houses)
# tseriescq.out <- tseriescq(houses,maxiter=4000,level=FALSE,trend=TRUE,
# seasonality=TRUE,priora=TRUE)
#
# Make sure that chain convergence is always assessed. Run the following
# code to show the cluster and diagnostic plots:
data(houses)
data(tseriescq.out)
attach(tseriescq.out)
clusterplots(tseriescq.out,houses)
diagplots(tseriescq.out)
Output of tseriescq function for the House price statistics in Scotland dataset
Description
This object contains the output of the function tseriescq for the example described in its documentation file.
Usage
data(tseriescq.out)
Details
See function tseriescq for an explanation of how the output was obtained.
Examples
data(tseriescq.out)