Type: | Package |
Title: | Labeled Optimal Partitioning |
Version: | 2024.6.19 |
Author: | Toby Dylan Hocking |
Maintainer: | Toby Dylan Hocking <toby.hocking@r-project.org> |
Description: | Change-point detection algorithm with label constraints and a penalty for each change outside of labels. Read TD Hocking, A Srivastava (2023) <doi:10.1007/s00180-022-01238-z> for details. |
License: | GPL-3 |
LinkingTo: | Rcpp |
URL: | https://github.com/tdhock/LOPART |
BugReports: | https://github.com/tdhock/LOPART/issues |
Imports: | data.table, Rcpp |
Suggests: | ggplot2, testthat |
RoxygenNote: | 7.3.1 |
NeedsCompilation: | yes |
Packaged: | 2024-06-20 11:31:04 UTC; tdhock |
Repository: | CRAN |
Date/Publication: | 2024-06-20 21:30:07 UTC |
Labeled Optimal PARTitioning
Description
Compute an optimal segmentation (change in Gaussian mean model, square loss), which is consistent with the given labels, and with a penalty for each changepoint outside of labeled regions.
Usage
LOPART(
x,
labels,
penalty_unlabeled,
n_updates = length(x),
penalty_labeled = penalty_unlabeled
)
Arguments
x |
numeric vector of data to fit a Gaussian mean model. |
labels |
data frame with at least three columns: start, end, changes. start/end should be indices of x, from 1 to length(x). changes should be either 0 or 1. The prediced changepoints are guaranteed to be consistent with these labels. |
penalty_unlabeled |
non-negative penalty constant (larger for fewer changes, smaller for more changes). penalty=0 means a change in every unlabeled region, penalty=Inf means no changes in unlabeled regions. |
n_updates |
how many dynamic programming updates to compute? Must be at least 1 and at most length(x). |
penalty_labeled |
non-negative penalty constant to use for changes in positive labels. |
Details
Provides a high-level interface to LOPART_interface R function and LOPART C code.
Value
list with named elements, all of which are data tables. loss has one row with loss/cost values. cost is the output from LOPART_interface. changes has one row for each predicted changepoint (e.g. change=1.5 means a change between data points 1 and 2). segments has one row for each segment.
Author(s)
Toby Dylan Hocking
Examples
set.seed(2)
library(data.table)
signal <- c(
rnorm(25, mean = 10),
rnorm(25, mean = 7),
rnorm(25, mean = 8),
rnorm(25, mean = 5))
#outliers
signal[86] <- 10
labels.dt <- data.table(
start = c(20, 45, 80),
end = c(30, 55, 90),
changes = c(1, 1, 0))
signal.dt <- data.table(
signal,
position=seq_along(signal))
label.colors <- c(
"1"="#ff7d7d",
"0"="#f6c48f")
sig.color <- "grey50"
if(require(ggplot2)){
gg.data <- ggplot()+
geom_rect(aes(
xmin=start, xmax=end,
fill=paste(changes),
ymin=-Inf, ymax=Inf),
alpha=0.5,
data=labels.dt)+
geom_point(aes(
position, signal),
color=sig.color,
data=signal.dt)+
scale_x_continuous(
"position",
breaks=seq(0, 100, by=10))+
scale_fill_manual("label", values=label.colors)+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))
print(gg.data)
}
label.list <- list(
OPART=labels.dt[0],
LOPART=labels.dt)
seg.dt.list <- list()
change.dt.list <- list()
cost.dt.list <- list()
for(model.name in names(label.list)){
label.dt <- data.table(label.list[[model.name]])
fit <- LOPART::LOPART(signal, label.dt, 10)
Algorithm <- factor(model.name, names(label.list))
tau.dt <- fit$cost[, .(
cost_candidates,
tau=0:(.N-1),
change=seq_along(cost_candidates)-0.5
)]
cost.dt.list[[model.name]] <- data.table(Algorithm, tau.dt)
seg.dt.list[[model.name]] <- data.table(Algorithm, fit$segments)
change.dt.list[[model.name]] <- data.table(Algorithm, fit$changes)
}
seg.dt <- do.call(rbind, seg.dt.list)
change.dt <- do.call(rbind, change.dt.list)
cost.dt <- do.call(rbind, cost.dt.list)
algo.sizes <- c(
OPART=1,
LOPART=0.5)
algo.colors <- c(
OPART="deepskyblue",
LOPART="black")
algo.shapes <- c(
OPART=1,
LOPART=2)
if(require(ggplot2)){
gg.data+
scale_size_manual(values=algo.sizes)+
scale_color_manual(values=algo.colors)+
geom_vline(aes(
xintercept=change,
size=Algorithm,
color=Algorithm),
data=change.dt)+
geom_segment(aes(
start-0.5, mean,
size=Algorithm,
color=Algorithm,
xend=end+0.5, yend=mean),
data=seg.dt)
}
if(require(ggplot2)){
ggplot()+
geom_rect(aes(
xmin=start, xmax=end,
fill=paste(changes),
ymin=-Inf, ymax=Inf),
alpha=0.5,
data=labels.dt)+
scale_fill_manual("label", values=label.colors)+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
scale_x_continuous(
"position",
breaks=seq(0, 100, by=10))+
geom_point(aes(
change, cost_candidates,
color=Algorithm, shape=Algorithm),
data=cost.dt)+
scale_color_manual(values=algo.colors)+
scale_shape_manual(values=algo.shapes)
}
abbrev.vec <- c(
data="data and models",
cost="cost of last change")
yfac <- function(l){
factor(abbrev.vec[[l]], abbrev.vec)
}
COST <- function(dt){
data.table(y.var=yfac("cost"), dt)
}
DATA <- function(dt){
data.table(y.var=yfac("data"), dt)
}
if(require(ggplot2)){
ggplot()+
geom_rect(aes(
xmin=start, xmax=end,
fill=paste(changes),
ymin=-Inf, ymax=Inf),
alpha=0.5,
data=labels.dt)+
scale_fill_manual("label", values=label.colors)+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
facet_grid(y.var ~ ., scales="free")+
geom_vline(aes(
xintercept=change,
size=Algorithm,
color=Algorithm),
data=change.dt)+
geom_segment(aes(
start-0.5, mean,
size=Algorithm,
color=Algorithm,
xend=end+0.5, yend=mean),
data=DATA(seg.dt))+
geom_point(aes(
position, signal),
color=sig.color,
shape=1,
data=DATA(signal.dt))+
scale_size_manual(values=algo.sizes)+
scale_color_manual(values=algo.colors)+
scale_shape_manual(values=algo.shapes)+
ylab("")+
scale_x_continuous(
"position",
breaks=seq(0, 100, by=10))+
geom_point(aes(
change, cost_candidates,
color=Algorithm, shape=Algorithm),
data=COST(cost.dt))
}
Labeled Optimal Partitioning interface
Description
Low-level interface to LOPART C code
Usage
LOPART_interface(
input_data,
input_label_start,
input_label_end,
input_label_changes,
n_updates,
penalty_unlabeled,
penalty_labeled = 0
)
Arguments
input_data |
numeric vector of N data to segment |
input_label_start |
integer vector of label start positions in 0, ..., N-2 |
input_label_end |
integer vector of label end positions in 1, ..., N-1 |
input_label_changes |
integer vector of 0/1, number of labeled changes |
n_updates |
number of dynamic programming updates to perform, usually should be number of input_data N, but can be less if you want to analyze/plot the cost/candidates at previous data. |
penalty_unlabeled |
non-negative numeric scalar (bigger for fewer changes in unlabeled regions, smaller for more changes) |
penalty_labeled |
non-negative numeric scalar (penalty for each change in a positive label). |
Details
Avoid using this function and instead use the LOPART function.
Value
data frame with four columns: cost_candidates is the cost of each last segment start considered (from 1 to N) for the computation of the optimal cost up to the last data point (Inf means infeasible); cost_optimal is the optimal cost vector computed using dynamic programming; mean is the last segment mean of the optimal model ending at that data point; last_change is the optimal changepoints (negative numbers are not used).
Author(s)
Toby Dylan Hocking