Title: | Sequential Change Point Detection for High-Dimensional VAR Models |
Version: | 0.2.0 |
Description: | Implements the algorithm introduced in Tian, Y., and Safikhani, A. (2024) <doi:10.5705/ss.202024.0182>, "Sequential Change Point Detection in High-dimensional Vector Auto-regressive Models". This package provides tools for detecting change points in the transition matrices of VAR models, effectively identifying shifts in temporal and cross-correlations within high-dimensional time series data. |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Imports: | MASS, corpcor, Matrix, glmnet, doParallel, stats |
Suggests: | ggplot2 |
License: | GPL-2 | file LICENSE |
URL: | https://github.com/Helloworld9293/VARcpDetectOnline |
BugReports: | https://github.com/Helloworld9293/VARcpDetectOnline/issues |
NeedsCompilation: | no |
Packaged: | 2025-02-12 18:06:25 UTC; AAA |
Author: | Yuhan Tian [aut, cre], Abolfazl Safikhani [aut] |
Maintainer: | Yuhan Tian <yuhan.tian@ufl.edu> |
Depends: | R (≥ 3.5.0) |
Repository: | CRAN |
Date/Publication: | 2025-02-13 08:40:02 UTC |
VAR_cpDetect_Online: Sequential change point Detection for Vector Auto-Regressive Models
Description
This function performs sequential change point detection in high-dimensional time series data modeled as a Vector Auto-Regressive (VAR) process, targeting changes in the transition matrices that encode temporal and cross-correlations.
Usage
VAR_cpDetect_Online(
data,
n0,
w,
alpha,
h,
RLmode = TRUE,
needRefine = TRUE,
refineSize = 1/5,
needRefineCorrection = TRUE
)
Arguments
data |
A matrix where rows represent different dimensions (features) and columns represent observations. The first |
n0 |
Integer. The size of the historical data (number of columns in |
w |
Integer. The size of the sliding window used for calculating test statistics; referred to as the pre-specified detection delay. |
alpha |
Numeric. The desired false alarm rate, where 1/alpha represents the targeted average run length (ARL), which should exceed the length of the data to be monitored. |
h |
Integer. The order of the VAR process. |
RLmode |
Logical. If |
needRefine |
Logical. If |
refineSize |
Numeric. The proportion of the new window size to the original window size, used during refinement. |
needRefineCorrection |
Logical. If |
Details
This function fits a VAR model to the historical data using the l1 penalty and calculates test statistics for the sliding window to detect change points. If refinement is enabled, a second step narrows down the change point location. Optionally, a correction step can verify the detected change points.
Value
A list containing:
RL
The index (ignoring historical data) of the last observation read by the algorithm when the first alarm is issued. This is returned only if
RLmode = TRUE
.cp_refined
The refined estimate for the location (ignoring historical data) of the change point. This is returned only if
RLmode = TRUE
andneedRefine = TRUE
.alarm_locations
A vector of indices (ignoring historical data) where alarms were raised. This is returned only if
RLmode = FALSE
.cp_locations
A vector of refined change point locations (ignoring historical data), corresponding 1-to-1 with the
alarm_locations
. This is returned only ifRLmode = FALSE
andneedRefine = TRUE
.
Examples
library(MASS)
set.seed(2024)
As <- list(matrix(c(0.5, 0.2, 0.1, 0.4), 2, 2))
As_new <- list(matrix(c(-0.5, 0.2, 0.1, -0.4), 2, 2))
Sig <- diag(2)
data_IC <- generateVAR(n = 400, As = As, Sig = Sig, h = 1)
data_OC <- generateVAR(n = 100, As = As_new, Sig = Sig, h = 1,
isOldProvided = TRUE, oldxs = data_IC[, ncol(data_IC)])
data <- cbind(data_IC, data_OC)
result <- VAR_cpDetect_Online(data, n0 = 300, w = 20, alpha = 1/200, h = 1,
RLmode = TRUE, needRefine = TRUE, refineSize = 1/5,
needRefineCorrection = TRUE)
print(result)
Apply Thresholding to VAR Coefficients
Description
Applies a thresholding rule to a coefficient matrix by setting entries below a certain threshold to zero. Two types of thresholding are available: "soft" and "hard".
Usage
applyThreshold(a_mat, nr, nc, p, type = "soft")
Arguments
a_mat |
Numeric matrix. The coefficient matrix to be thresholded. |
nr |
Integer. The number of rows in the original data. |
nc |
Integer. The number of variables (columns) in the original data. |
p |
Integer. The order of the VAR model. |
type |
Character. The type of threshold to apply; either |
Value
The thresholded coefficient matrix.
Compute VAR Model Residuals
Description
Computes the residuals from a VAR model by subtracting the fitted values (obtained from the estimated coefficient matrices) from the original time series data.
Usage
computeResiduals(data, A)
Arguments
data |
A numeric matrix of the original time series (observations in rows). |
A |
List. A list of VAR coefficient matrices (one for each lag). |
Value
A numeric matrix of residuals.
Cross-Validated VAR Estimation using Elastic Net
Description
This internal function performs cross validation for VAR estimation using the elastic net penalty. It prepares the data, calls the elastic net CV routine, reshapes the estimated coefficients, applies optional thresholding, computes residuals, and estimates the error covariance.
Usage
cvVAR(data, p, opt = NULL)
Arguments
data |
A numeric matrix with time series data (observations in rows, variables in columns). |
p |
Integer. The order of the VAR model. |
opt |
List. A list of options (see |
Value
A list with components:
mu |
Vector of means of the original series. |
A |
List of VAR coefficient matrices (one for each lag). |
fit |
The complete elastic net CV fit (if requested). |
lambda |
The optimal lambda value chosen by CV. |
mse |
The minimum mean squared error from CV. |
mse_sd |
Standard deviation of the MSE. |
time |
Elapsed time for the ENET estimation. |
series |
The transformed series (after centering/scaling). |
residuals |
Residuals from the VAR model. |
sigma |
Estimated covariance matrix of the residuals. |
Cross Validation for Elastic Net VAR Estimation
Description
This internal function performs cross validation using elastic net (ENET)
estimation via the glmnet
package. It supports parallel processing if requested.
Usage
cvVAR_ENET(X, y, nvar, opt)
Arguments
X |
A numeric matrix of predictors. |
y |
Numeric vector of responses. |
nvar |
Integer. The number of variables in the original VAR (number of columns in data). |
opt |
List. A list of options including:
|
Value
An object of class cv.glmnet
as returned by glmnet::cv.glmnet
.
Construct Lagged Design Matrix for VAR
Description
Duplicates the original data matrix to create a lagged predictor matrix for VAR estimation.
Usage
duplicateMatrix(data, p)
Arguments
data |
A numeric matrix with time series data (observations in rows). |
p |
Integer. The order of the VAR model (number of lags). |
Value
A numeric matrix with duplicated columns corresponding to lagged observations.
Estimate Covariance Matrix from Residuals
Description
Estimates the covariance (or variance) matrix of the residuals using shrinkage estimation.
This function utilizes corpcor::cov.shrink
for covariance estimation.
Usage
estimateCovariance(res, ...)
Arguments
res |
A numeric matrix of residuals from the VAR model. |
... |
Additional arguments passed to |
Value
A numeric covariance matrix.
Fit VAR Model with Elastic Net via Cross Validation
Description
Estimates a (possibly high-dimensional) VAR model using penalized least squares with an elastic net penalty and cross validation. This function is adapted from the sparsevar package (https://github.com/svazzole/sparsevar/tree/master), which is distributed under the GNU General Public License v2. The code has been modified to specifically implement the elastic net penalty (penalty = "ENET") and cross validation (method = "cv").
Usage
fitVAR(data, p = 1, ...)
Arguments
data |
A numeric matrix or data frame with time series data (observations in rows, variables in columns). |
p |
Integer. The order of the VAR model. |
... |
Additional options for estimation. Global options include:
|
Value
A list with the following components:
mu |
A vector of means for each variable. |
A |
A list (of length |
fit |
(Optional) The complete results of the penalized least squares estimation. |
lambda |
The chosen lambda value (by cross validation). |
mse |
The minimum mean squared error from cross validation. |
mse_sd |
The standard deviation of the mean squared error. |
time |
Elapsed time for the estimation. |
series |
The (possibly transformed) input time series. |
residuals |
The residuals of the VAR model. |
sigma |
The estimated variance/covariance matrix of the residuals. |
References
The original source code is adapted from the sparsevar package, which is distributed under the GNU General Public License v2.
Generate VAR Data
Description
This function generates Vector Auto-Regressive (VAR) data based on the provided parameters.
Usage
generateVAR(n, As, Sig, h, isOldProvided = FALSE, oldxs = NULL)
Arguments
n |
Integer. The length of the VAR data to be generated. |
As |
List. A list containing the transition matrices for the VAR process. |
Sig |
Matrix. The covariance matrix of errors. |
h |
Integer. The order of the VAR process. |
isOldProvided |
Logical. If TRUE, the VAR data will be generated based on the last observations from the previous segment of data. Defaults to FALSE. |
oldxs |
Matrix. A |
Value
A data matrix of dimensions p
by n
.
Examples
# Example usage
As <- list(matrix(c(0.5, 0.2, 0.1, 0.4), 2, 2))
Sig <- diag(2)
data <- generateVAR(n = 100, As = As, Sig = Sig, h = 1)
Identify the Beginning of the Alarm Clusters
Description
This function clusters alarms into groups and identifies the starting points of the alarm clusters.
If the next alarm occurs within a specified window size (w
) from the current alarm,
it will be considered part of the current cluster. Otherwise, a new cluster will be formed.
Usage
get_cps(alarms, w)
Arguments
alarms |
A numeric vector. The alarms raised during the monitoring process. |
w |
An integer. The window size used to group alarms into clusters. |
Value
A numeric vector containing the starting points of the alarm clusters.
If the next alarm is within w
observations of the current alarm,
the next alarm will be considered part of the current alarm cluster.
Otherwise, a new cluster is formed and the next alarm is considered the beginning
of a new alarm cluster.
Examples
# Example usage:
alarms <- c(10, 15, 30, 35, 60)
change_points <- get_cps(alarms, w = 10)
S&P 500 Daily Log Returns and Corresponding Dates
Description
This dataset contains daily log returns for 186 stocks in the S&P 500 index from February 6, 2004, to March 2, 2016. The daily log returns are calculated using the adjusted daily closing prices. The dataset also contains the corresponding dates for each log return.
Usage
data(sp500)
Format
A list with two elements:
sp500$log_daily_return
A matrix with dimensions 3037 (rows, trading days) by 186 (columns, stocks).
sp500$date
A vector of length 3037, containing the dates for each trading day.
Details
The dataset is provided as an .RData
file containing:
-
sp500$log_daily_return
: A matrix of daily log returns with 3037 rows (trading days) and 186 columns (stocks). -
sp500$date
: A vector of length 3037 containing the dates for each daily log return.
Source
Data from the S&P 500 stock index (2004-2016).
See Also
Examples
# Example Usage: Applying Change Point Detection to S&P 500 Data
# This is an example of how to apply the change point detection method
# (using the VAR_cpDetect_Online function) on the daily log return
# dataset from the S&P 500 (stored in the sp500 dataset). The code
# below calculates the average return volatility for all stocks, applies
# the change point detection algorithm, and plots the results with detected
# change points shown as vertical red and black lines.
# Load the dataset
data(sp500)
# Set parameters
library(ggplot2)
set.seed(2024)
n_sp <- nrow(sp500$log_daily_return)
p_sp <- ncol(sp500$log_daily_return)
# Calculate average return volatility for all data points
volatility_sum <- rep(0, (n_sp - 21))
for(col in 1:p_sp){
temp <- as.numeric(sp500$log_daily_return[, col])
temp1 <- rep(0, (n_sp - 21))
for(row in 1:(n_sp - 21)){
temp1[row] <- sd(temp[(row):(row + 21)])
}
volatility_sum <- volatility_sum + temp1
}
volatility_ave <- volatility_sum / p_sp
# Apply change point detection method
n0 <- 200
w <- 22
alpha <- 1 / 5000
res <- VAR_cpDetect_Online(t(sp500$log_daily_return), n0, w, alpha, 1, FALSE, TRUE, 5 / w, TRUE)
res_sp <- res$alarm_locations + n0
res_sp_cps <- res$cp_locations + n0
# Get the estimated starting points of each alarm cluster
cps_est_sp <- unique(res_sp_cps[which(res_sp %in% get_cps(res_sp, w))])
# Prepare data for plotting
y_values <- c(volatility_ave)
x_values <- sp500$date[1:(n_sp - 21)]
df <- data.frame(y_values, x_values)
plot_sp <- ggplot(df, aes(y = y_values, x = x_values)) +
geom_line() +
theme(legend.position = "none") +
labs(title = "", x = "", y = "") +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
geom_vline(xintercept = sp500$date[res_sp], linetype = "solid", color = "red", alpha = .1) +
geom_vline(xintercept = sp500$date[cps_est_sp], linetype = "solid", color = "black")
# Print the detected change points
sp500$date[cps_est_sp] # The dates for the starting of the alarm clusters
plot_sp
Split Coefficient Matrix into VAR Lags
Description
Splits a matrix of estimated coefficients into a list of matrices, each corresponding to one lag of the VAR model.
Usage
splitMatrix(M, p)
Arguments
M |
A numeric matrix of coefficients. |
p |
Integer. The order of the VAR model (number of lags). |
Value
A list of p
matrices, each of dimension (number of variables) x (number of variables).
Transform Data for VAR Estimation
Description
Transforms the input time series data into the design matrices required for VAR estimation. This includes centering, optional scaling, and constructing the lagged predictor matrix.
Usage
transformData(data, p, opt)
Arguments
data |
A numeric matrix or data frame with time series data (observations in rows, variables in columns). |
p |
Integer. The order of the VAR model (number of lags). |
opt |
List. Options for data transformation. Supported options include:
|
Value
A list with the following components:
X |
The design matrix (via the Kronecker product) for lagged predictors. |
y |
A vectorized response corresponding to the lagged data. |
series |
The (centered and possibly scaled) original time series matrix. |
mu |
A row vector of the column means used for centering. |