Offline Changepoint Detection

PELT: Pruned Exact Linear Time

PELT is the gold standard for offline multiple changepoint detection:

library(RegimeChange)

# Generate data with multiple changepoints
set.seed(42)
data <- c(
  rnorm(100, 0, 1),   # Regime 1
  rnorm(100, 3, 1),   # Regime 2
  rnorm(100, 1, 2),   # Regime 3
  rnorm(100, 4, 0.5)  # Regime 4
)
true_cps <- c(100, 200, 300)

# Detect with PELT
result_pelt <- detect_regimes(data, method = "pelt", penalty = "BIC")
print(result_pelt)
#> 
#> Regime Change Detection Results
#> ================================
#> 
#> Method: pelt 
#> Change type: both 
#> Mode: offline 
#> 
#> Data: n = 400 observations
#> 
#> Changepoints detected: 3 
#> Locations: 100, 200, 300 
#> 
#> 95% Confidence Intervals:
#>   CP 1: 100 [66, 122]
#>   CP 2: 200 [180, 222]
#>   CP 3: 300 [280, 320]
#> 
#> Segments:
#>   Segment 1: [1, 100] (n=100) | mean=0.033, sd=1.041
#>   Segment 2: [101, 200] (n=100) | mean=2.913, sd=0.904
#>   Segment 3: [201, 300] (n=100) | mean=0.979, sd=2.034
#>   Segment 4: [301, 400] (n=100) | mean=4.016, sd=0.438

Penalty Selection

The penalty controls the trade-off between fit and complexity:

# Different penalties
result_bic <- detect_regimes(data, method = "pelt", penalty = "BIC")
result_aic <- detect_regimes(data, method = "pelt", penalty = "AIC")
result_mbic <- detect_regimes(data, method = "pelt", penalty = "MBIC")

cat("BIC:", result_bic$n_changepoints, "changepoints\n")
#> BIC: 3 changepoints
cat("AIC:", result_aic$n_changepoints, "changepoints\n")
#> AIC: 81 changepoints
cat("MBIC:", result_mbic$n_changepoints, "changepoints\n")
#> MBIC: 3 changepoints

BIC: Bayesian Information Criterion (balanced, default)
AIC: Akaike Information Criterion (more changepoints)
MBIC: Modified BIC (fewer changepoints)
Manual: Use a numeric value for custom penalty

Minimum Segment Length

Prevent very short segments:

result <- detect_regimes(data, method = "pelt", min_segment = 30)

Binary Segmentation

A fast greedy approach:

result_binseg <- detect_regimes(data, method = "binseg", n_changepoints = 5)
print(result_binseg)
#> 
#> Regime Change Detection Results
#> ================================
#> 
#> Method: binseg 
#> Change type: both 
#> Mode: offline 
#> 
#> Data: n = 400 observations
#> 
#> Changepoints detected: 3 
#> Locations: 100, 200, 300 
#> 
#> 95% Confidence Intervals:
#>   CP 1: 100 [19, 340]
#>   CP 2: 200 [20, 377]
#>   CP 3: 300 [3, 388]
#> 
#> Segments:
#>   Segment 1: [1, 100] (n=100) | mean=0.033, sd=1.041
#>   Segment 2: [101, 200] (n=100) | mean=2.913, sd=0.904
#>   Segment 3: [201, 300] (n=100) | mean=0.979, sd=2.034
#>   Segment 4: [301, 400] (n=100) | mean=4.016, sd=0.438
#> 
#> Information Criteria:
#>   BIC: 1147.98
#>   AIC: 1116.05

Binary segmentation finds changepoints recursively but doesn’t guarantee global optimum.

Wild Binary Segmentation

More robust than standard binary segmentation:

result_wbs <- detect_regimes(data, method = "wbs", M = 100)
print(result_wbs)
#> 
#> Regime Change Detection Results
#> ================================
#> 
#> Method: wbs 
#> Change type: both 
#> Mode: offline 
#> 
#> Data: n = 400 observations
#> 
#> Changepoints detected: 3 
#> Locations: 100, 200, 300 
#> 
#> 95% Confidence Intervals:
#>   CP 1: 100 [20, 277]
#>   CP 2: 200 [40, 361]
#>   CP 3: 300 [20, 380]
#> 
#> Segments:
#>   Segment 1: [1, 100] (n=100) | mean=0.033, sd=1.041
#>   Segment 2: [101, 200] (n=100) | mean=2.913, sd=0.904
#>   Segment 3: [201, 300] (n=100) | mean=0.979, sd=2.034
#>   Segment 4: [301, 400] (n=100) | mean=4.016, sd=0.438
#> 
#> Information Criteria:
#>   BIC: 1147.98

WBS uses random intervals making it more robust to closely-spaced changepoints.

Detecting Different Types of Changes

Mean Changes

result_mean <- detect_regimes(data, type = "mean")

Variance Changes

# Data with variance change
set.seed(123)
var_data <- c(rnorm(100, 0, 1), rnorm(100, 0, 3))

result_var <- detect_regimes(var_data, type = "variance")
print(result_var)
#> 
#> Regime Change Detection Results
#> ================================
#> 
#> Method: pelt 
#> Change type: variance 
#> Mode: offline 
#> 
#> Data: n = 200 observations
#> 
#> Changepoints detected: 4 
#> Locations: 96, 134, 164, 173 
#> 
#> 95% Confidence Intervals:
#>   CP 1: 96 [78, 111]
#>   CP 2: 134 [116, 156]
#>   CP 3: 164 [146, 186]
#>   CP 4: 173 [103, 194]
#> 
#> Segments:
#>   Segment 1: [1, 96] (n=96) | mean=0.069, sd=0.886
#>   Segment 2: [97, 134] (n=38) | mean=-0.470, sd=2.228
#>   Segment 3: [135, 164] (n=30) | mean=-0.266, sd=3.892
#>   Segment 4: [165, 173] (n=9) | mean=0.245, sd=1.208
#>   Segment 5: [174, 200] (n=27) | mean=-0.228, sd=2.806

Mean and Variance Changes

result_both <- detect_regimes(data, type = "both")

Visualization

# Basic plot with changepoints
plot(result_pelt, type = "data")

# Segment-colored plot
plot(result_pelt, type = "segments")

Segment Analysis

Access segment information:

# Get segment details
for (i in seq_along(result_pelt$segments)) {
  seg <- result_pelt$segments[[i]]
  cat(sprintf("Segment %d: [%d, %d] - Mean: %.2f, SD: %.2f\n", 
              i, seg$start, seg$end, seg$params$mean, seg$params$sd))
}
#> Segment 1: [1, 100] - Mean: 0.03, SD: 1.04
#> Segment 2: [101, 200] - Mean: 2.91, SD: 0.90
#> Segment 3: [201, 300] - Mean: 0.98, SD: 2.03
#> Segment 4: [301, 400] - Mean: 4.02, SD: 0.44

Uncertainty Quantification

Get confidence intervals using bootstrap:

result_ci <- detect_regimes(data, method = "pelt", 
                            uncertainty = TRUE, bootstrap_reps = 100)

if (length(result_ci$confidence_intervals) > 0) {
  print(result_ci$confidence_intervals[[1]])
}
#> $estimate
#> [1] 100
#> 
#> $lower
#> 2.5% 
#>   80 
#> 
#> $upper
#> 97.5% 
#>   120 
#> 
#> $se
#> [1] 9.906003

Evaluation Against Ground Truth

eval_result <- evaluate(result_pelt, true_changepoints = true_cps)
print(eval_result)
#> 
#> Changepoint Detection Evaluation
#> =================================
#> 
#> Detection Performance:
#>   True changepoints: 3
#>   Detected: 3
#>   Matched: 3
#>   Precision: 1.000
#>   Recall: 1.000
#>   F1 Score: 1.000
#> 
#> Localization Accuracy:
#>   Hausdorff Distance: 0.00
#>   Mean Absolute Error: 0.00
#>   RMSE: 0.00
#> 
#> Segmentation Quality:
#>   Rand Index: 1.000
#>   Adjusted Rand Index: 1.000
#>   Covering Metric: 1.000

Key metrics: - Hausdorff distance: Maximum error in changepoint location - F1 score: Balance of precision and recall - Adjusted Rand Index: Segmentation agreement corrected for chance

Comparing Methods

comparison <- compare_methods(
  data = data,
  methods = c("pelt", "binseg", "wbs"),
  true_changepoints = true_cps
)
print(comparison)
#> 
#> Changepoint Detection Method Comparison
#> ========================================
#> 
#>  method n_changepoints   time_sec f1 hausdorff adj_rand
#>    pelt              3 0.04858112  1         0        1
#>  binseg              3 0.02454591  1         0        1
#>     wbs              3 0.39748216  1         0        1
#> 
#> True changepoints: 100, 200, 300 
#> 
#> Detected changepoints by method:
#>   pelt: 100, 200, 300
#>   binseg: 100, 200, 300
#>   wbs: 100, 200, 300

Best Practices

Start with PELT using BIC penalty
Validate segment length - use min_segment to avoid short segments
Compare multiple methods when stakes are high
Use bootstrap CI for critical applications
Visualize results to sanity-check detection

Tips for Difficult Cases

Closely Spaced Changepoints

Use WBS instead of PELT:

detect_regimes(data, method = "wbs", M = 200)

Small Change Magnitudes

Lower the penalty:

detect_regimes(data, method = "pelt", penalty = "AIC")

Many Changepoints

Use ensemble methods:

detect_regimes(data, method = "ensemble", 
               methods = c("pelt", "wbs", "binseg"))