| Type: | Package | 
| Title: | Approximate and Exact Optimal Transport Methods | 
| Version: | 1.2 | 
| Date: | 2025-01-06 | 
| Maintainer: | Eric Dunipace <edunipace@mail.harvard.edu> | 
| Description: | R and C++ functions to perform exact and approximate optimal transport. All C++ methods can be linked to other R packages via their header files. | 
| License: | GPL (== 3.0) | 
| Imports: | Rcpp (≥ 1.0.3), stats | 
| LinkingTo: | Rcpp, RcppEigen, RcppCGAL, BH | 
| BugReports: | https://github.com/ericdunipace/approxOT/issues | 
| Suggests: | testthat (≥ 2.1.0), transport | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| SystemRequirements: | C++17 | 
| URL: | https://github.com/ericdunipace/approxOT | 
| NeedsCompilation: | yes | 
| Packaged: | 2025-01-09 00:44:28 UTC; eifer | 
| Author: | Eric Dunipace | 
| Repository: | CRAN | 
| Date/Publication: | 2025-01-09 12:30:06 UTC | 
An R package to perform exact and approximate optimal transport.
Description
R and C++ functions to perform exact and approximate optimal transport. All C++ methods are linkable to other R packages via their header files.
Author(s)
Eric Dunipace
See Also
Useful links:
- Report bugs at https://github.com/ericdunipace/approxOT/issues 
Transform transportation plan to transportation matrix
Description
Transform transportation plan to transportation matrix
Usage
## S3 method for class 'transport.plan'
as.matrix(x, ...)
Arguments
| x | An object of class 'transport.plan'. See output of (transport_plan)[transport_plan()] | 
| ... | Unused arguments | 
Value
A matrix specifying the minimal joint distribution between samples. Margins will be equal to the marginal distributions of the samples
Examples
set.seed(203987)
n <- 5
d <- 2
x <- matrix(rnorm(d*n), nrow=d, ncol=n)
y <- matrix(rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, 
                         observation.orientation =  "colwise", 
                         method = "hilbert")
trans_matrix <- as.matrix(trans_plan)
print(trans_matrix)
Transform transportation matrix to transportation plan
Description
Transform transportation matrix to transportation plan
Usage
as.transport.plan(transport_matrix, ...)
Arguments
| transport_matrix | A matrix that is a transportation matrix, i.e. the minimal joint distribution for two samples. | 
| ... | Unused arguments | 
Value
An object of class 'transport.plan'. See output of (transport_plan)[transport_plan]
Examples
set.seed(203987)
n <- 5
d <- 2
x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, 
                         observation.orientation =  "colwise", 
                         method = "hilbert")
trans_matrix <- as.matrix(trans_plan$tplan)
tplan2 <- as.transport.plan(trans_matrix)
all.equal(tplan2, trans_plan$tplan)
Calculate cost matrix
Description
Calculate cost matrix
Usage
cost_calc(X, Y, ground_p)
Arguments
| X | matrix of values in first sample. Observations should be by column, not rows. | 
| Y | matrix of Values in second sample. Observations should be by column, not rows. | 
| ground_p | power of the Lp norm to use in cost calculation. | 
Value
matrix of costs
Examples
X <- matrix(rnorm(10*100), 10, 100)
Y <- matrix(rnorm(10*100), 10, 100)
# the Euclidean distance
cost <- cost_calc(X, Y, ground_p = 2)
Covert the 2-dimensional index to 1-dimensional index
Description
Covert the 2-dimensional index to 1-dimensional index
Usage
dist_2d_to_1d(i, j, n, m)
Arguments
| i | Index of row | 
| j | Index of column | 
| n | Total number of rows | 
| m | Total number of columns | 
Value
a 1d index for easy matrix entry
One-dimensional optimal transport for measures with more general mass
Description
One-dimensional optimal transport for measures with more general mass
Usage
general_1d_transport(
  X,
  Y,
  a = NULL,
  b = NULL,
  method = c("hilbert", "univariate")
)
Arguments
| X | Data for sample one. Should be a vector if method is "univariate" or a matrix if method is "hilbert" | 
| Y | Data for sample two Should be a vector if method is "univariate" or a matrix if method is "hilbert" | 
| a | Empirical measure for sample one. | 
| b | Empirical measure for sample two. | 
| method | One of "hilbert" or "univariate" | 
Value
An optimal transportation plan as a list with slots "from", "to", and "mass"
Examples
set.seed(23423)
n <- 100
d <- 10
x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d)
y <- matrix(stats::rnorm(n*d), n, d)
trans <- general_1d_transport(t(x), t(y))
Get order along the Hilbert curve
Description
Get order along the Hilbert curve
Usage
hilbert.projection(X, Sigma = NULL)
Arguments
| X | matrix of values. Observations are unique by rows. | 
| Sigma | Covariance of the data. If provided, uses a Mahalanobis distance. | 
Value
Index of orders
Examples
X <- matrix(rnorm(10*3), 3, 10)
idx <- hilbert.projection(X)
print(idx)
Returns orders along the Hilbert space-filling Curve
Description
Returns orders along the Hilbert space-filling Curve
Usage
hilbert_proj_(A)
Arguments
| A | a matrix of data-values of class Eigen::MatrixXd | 
Value
An integer vector of orders
Check if function is a transport.plan
Description
Check if function is a transport.plan
Usage
is.transport.plan(tplan)
Arguments
| tplan | An object of class 'transport.plan'. See output of (transport_plan)[transport_plan] | 
Value
Logical
Examples
set.seed(203987)
n <- 5
d <- 2
x <- matrix(rnorm(d*n), nrow=d, ncol=n)
y <- matrix(rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
trans_plan <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, 
                         observation.orientation =  "colwise", 
                         method = "hilbert")
print(is.transport.plan(trans_plan))
Round transportation matrix to feasible set
Description
Round transportation matrix to feasible set
Usage
round_transport_matrix(transport_matrix, mass_x, mass_y)
Arguments
| transport_matrix | A transportation matrix returned by an approximate method | 
| mass_x | The distribution of the first margin | 
| mass_y | The distribution of the second margin | 
Value
Returns a transportation matrix projected to the feasible set.
Return the dual potentials for the Sinkhorn distance
Description
Return the dual potentials for the Sinkhorn distance
Usage
sinkhorn_pot(
  mass_x,
  mass_y,
  p = 2,
  cost = NULL,
  cost_a = NULL,
  cost_b = NULL,
  ...
)
Arguments
| mass_x | The empirical distribution of the first sample | 
| mass_y | The empirical distribution of the second sample | 
| p | The power to raise the cost by | 
| cost | The cost matrix between first and second samples | 
| cost_a | The cost matrix for the first sample | 
| cost_b | The cost matrix for the second sample | 
| ... | Additional arguments including 
 | 
Value
A list with slots "f" and "g", the potentials of the rows and margins, respectively.
Function returning supported optimal transportation methods.
Description
Function returning supported optimal transportation methods.
Usage
transport_options()
Details
The currently supported methods are
- exact, networkflow: Utilize the networkflow algorithm to solve the exact optimal transport problem 
- shortsimplex: Use the shortsimplex algorithm to solve the exact optimal transport problem 
- sinkhorn: Use Sinkhorn's algorithm to solve the approximate optimal transport problem 
- sinkhorn_log: Use Sinkhorn's algorithm on a log-scale for added stability to solve the approximate optimal transport problem 
- greenkhorn: Use the Greenkhorn algorithm to solve the approximate optimal transport problem 
- hilbert: Use hilbert sorting to perform approximate optimal transport 
- rank: use the average covariate ranks to perform approximate optimal transport 
- univariate: Use appropriate optimal transport methods for univariate data 
- swapping: Utilize the swapping algorithm to perform approximate optimal transport 
- sliced: Use the sliced optimal transport distance 
Value
Returns a vector of supported transport methods
Optimal transport plans
Description
Optimal transport plans
Usage
transport_plan(
  X,
  Y,
  a = NULL,
  b = NULL,
  p = 2,
  ground_p = 2,
  observation.orientation = c("rowwise", "colwise"),
  method = transport_options(),
  ...
)
Arguments
| X | The covariate data of the first sample. | 
| Y | The covariate data of the second sample. | 
| a | Optional. Empirical measure of the first sample | 
| b | Optional. Empirical measure of the second sample | 
| p | The power of the Wasserstein distance | 
| ground_p | The power of the Lp norm | 
| observation.orientation | Are observations by row ("rowwise") or column ("colwise"). | 
| method | Which transportation method to use. See [transport_options][transport_options] | 
| ... | Additional arguments for various methods 
 | 
Value
a list with slots "tplan" and "cost". "tplan" is the optimal transport plan and "cost" is the optimal transport distance.
Examples
set.seed(203987)
n <- 100
d <- 10
x <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
y <- matrix(stats::rnorm(d*n), nrow=d, ncol=n)
#get hilbert sort orders for x in backwards way
transx <- transport_plan(X=x, Y=x, ground_p = 2, p = 2, 
                         observation.orientation =  "colwise", 
                         method = "hilbert")
Optimal transport plans given a pre-specified cost
Description
Optimal transport plans given a pre-specified cost
Usage
transport_plan_given_C(
  mass_x,
  mass_y,
  p = 2,
  cost = NULL,
  method = "exact",
  cost_a = NULL,
  cost_b = NULL,
  ...
)
Arguments
| mass_x | The empirical measure of the first sample | 
| mass_y | The empirical measure of the second sample. | 
| p | The power of the Wasserstein distance | 
| cost | Specify the cost matrix in advance. | 
| method | The transportation method to use, one of "exact", "networkflow","shortsimplex", "sinkhorn", "greenkhorn" | 
| cost_a | The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn | 
| cost_b | The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn | 
| ... | Additional arguments for various methods 
 | 
Value
A transportation plan as an object of class "transport.plan", which is a list with slots "from","to", and "mass".
Examples
n <- 32
d <- 5
set.seed(293897)
A <- matrix(stats::rnorm(n*d),nrow=d,ncol=n)
B <- matrix(stats::rnorm(n*d),nrow=d,ncol=n)
transp.meth <- "sinkhorn"
niter <- 1e2
test <- transport_plan_given_C(rep(1/n,n), 
rep(1/n,n),  2, cost = cost_calc(A,B,2), 
"sinkhorn", niter = niter)
Multimarginal optimal transport plans
Description
Multimarginal optimal transport plans
Usage
transport_plan_multimarg(
  ...,
  p = 2,
  ground_p = 2,
  observation.orientation = c("rowwise", "colwise"),
  method = c("hilbert", "univariate", "sliced"),
  nsim = 1000
)
Arguments
| ... | Either data matrices as separate arguments or a list of data matrices. Arguments after the data must be specified by name. | 
| p | The power of the Wasserstein distance to use | 
| ground_p | The power of the Euclidean distance to use | 
| observation.orientation | Are observations by rows or columns | 
| method | One of "hilbert", "univariate", or "sliced" | 
| nsim | Number of simulations to use for the sliced method | 
Value
transport plan
Examples
set.seed(23423)
n <- 100
d <- 10
p <- ground_p <- 2 #euclidean cost, p = 2
x <- matrix(stats::rnorm((n + 11)*d), n + 11 , d)
y <- matrix(stats::rnorm(n*d), n, d)
z <- matrix(stats::rnorm((n +455)*d), n +455, d)
# make data a list
data <- list(x,y,z)
tplan <- transport_plan_multimarg(data, p = p, ground_p = ground_p,
observation.orientation = "rowwise", method = "hilbert")
#' #transpose data works too
datat <- lapply(data, t)
tplan2 <- transport_plan_multimarg(datat, p = p, ground_p = ground_p,
observation.orientation = "colwise",method = "hilbert")
Calculate the Wasserstein distance
Description
Calculate the Wasserstein distance
Usage
wasserstein(
  X = NULL,
  Y = NULL,
  a = NULL,
  b = NULL,
  cost = NULL,
  tplan = NULL,
  p = 2,
  ground_p = 2,
  method = transport_options(),
  cost_a = NULL,
  cost_b = NULL,
  ...
)
Arguments
| X | The covariate data of the first sample. | 
| Y | The covariate data of the second sample. | 
| a | Optional. Empirical measure of the first sample | 
| b | Optional. Empirical measure of the second sample | 
| cost | Specify the cost matrix in advance. | 
| tplan | Give a transportation plan with slots "from", "to", and "mass", like that returned by the [tranportation_plan()] function. | 
| p | The power of the Wasserstein distance | 
| ground_p | The power of the Lp norm | 
| method | Which transportation method to use. See [transport_options()] | 
| cost_a | The cost matrix for the first sample with itself. Only used for unbiased Sinkhorn | 
| cost_b | The cost matrix for the second sample with itself. Only used for unbiased Sinkhorn | 
| ... | Additional arguments for various methods: 
 | 
Value
The p-Wasserstein distance, a numeric value
Examples
set.seed(11289374)
n <- 100
z <- stats::rnorm(n)
w <- stats::rnorm(n)
uni <- approxOT::wasserstein(X = z, Y = w, 
p = 2, ground_p = 2, 
observation.orientation = "colwise", 
method = "univariate")