Title: | Distributional Synthetic Controls Estimation |
Version: | 0.1.1 |
Description: | The method of synthetic controls is a widely-adopted tool for evaluating causal effects of policy changes in settings with observational data. In many settings where it is applicable, researchers want to identify causal effects of policy changes on a treated unit at an aggregate level while having access to data at a finer granularity. This package implements a simple extension of the synthetic controls estimator, developed in Gunsilius (2023) <doi:10.3982/ECTA18260>, that takes advantage of this additional structure and provides nonparametric estimates of the heterogeneity within the aggregate unit. The idea is to replicate the quantile function associated with the treated unit by a weighted average of quantile functions of the control units. The package contains tools for aggregating and plotting the resulting distributional estimates, as well as for carrying out inference on them. |
License: | MIT + file LICENSE |
BugReports: | https://github.com/Davidvandijcke/DiSCos/issues |
URL: | http://www.davidvandijcke.com/DiSCos/, https://github.com/Davidvandijcke/DiSCos |
LazyData: | TRUE |
Imports: | CVXR, pracma, Rdpack, parallel, evmix, utils, extremeStat, MASS |
Depends: | data.table, R (≥ 2.10), ggplot2 |
RdMacros: | Rdpack |
Suggests: | haven, latex2exp, knitr, rmarkdown, maps, testthat (≥ 3.0.0), quadprog |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.2 |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-07-23 03:12:09 UTC; davidvandijcke |
Author: | David Van Dijcke |
Maintainer: | David Van Dijcke <dvdijcke@umich.edu> |
Repository: | CRAN |
Date/Publication: | 2024-07-23 03:30:03 UTC |
Distributional Synthetic Controls
Description
This function implements the distributional synthetic controls (DiSCo) method from Gunsilius (2023). as well as the alternative mixture of distributions approach.
Usage
DiSCo(
df,
id_col.target,
t0,
M = 1000,
G = 1000,
num.cores = 1,
permutation = FALSE,
q_min = 0,
q_max = 1,
CI = FALSE,
boots = 500,
replace = TRUE,
uniform = FALSE,
cl = 0.95,
graph = FALSE,
qmethod = NULL,
qtype = 7,
seed = NULL,
simplex = FALSE,
mixture = FALSE,
grid.cat = NULL
)
Arguments
df |
Data frame or data table containing the distributional data for the target and control units. The data table should contain the following columns:
|
id_col.target |
Variable indicating the name of the target unit, as specified in the id_col column of the data table. This variable can be any type, as long as it is the same type as the id_col column of the data table. |
t0 |
Integer indicating period of treatment. |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
G |
Integer indicating the number of grid points for the grid on which the estimated functions are evaluated. Default is 1000. |
num.cores |
Integer, number of cores to use for parallel computation. Default is 1. If the |
permutation |
Logical, indicating whether to use the permutation method for computing the optimal weights. Default is FALSE. |
q_min |
Numeric, minimum quantile to use. Set this together with |
q_max |
Numeric, maximum quantile to use. Set this together with |
CI |
Logical, indicating whether to compute confidence intervals for the counterfactual quantiles. Default is FALSE. The confidence intervals are computed using the bootstrap procedure described in Van Dijcke et al. (2024). |
boots |
Integer, number of bootstrap samples to use for computing confidence intervals. Default is 500. |
replace |
Logical, indicating whether to sample with replacement when computing the bootstrap samples. Default is TRUE. |
uniform |
Logical, indicating whether to construct uniform bootstrap confidence intervals. Default is FALSE If FALSE, the confidence intervals are pointwise. |
cl |
Numeric, confidence level for the (two-sided) confidence intervals. |
graph |
Logical, indicating whether to plot the permutation graph as in Figure 3 of the paper. Default is FALSE. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
seed |
Integer, seed for the random number generator. This needs to be set explicitly in the function call, since it will invoke |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
grid.cat |
List, containing the discrete support points for a discrete grid to be used with the mixture of distributions approach. This is useful for constructing synthetic distributions for categorical variables. Default is NULL, which uses a continuous grid based on the other parameters. |
Details
This function is called for every time period in the DiSCo function. It implements the DiSCo method for a single time period, as well as the mixture of distributions approach.
The corresponding results for each time period can be accessed in the results.periods
list of the output of the DiSCo function. The DiSCo function returns the average weight for each unit across all periods,
calculated as a uniform mean, as well as the counterfactual target distribution produced as the weighted average of the control distributions for each period, using these averaged weights.
Value
A list containing the following elements:
-
results.periods
A list containing, for each time period, the elements described in the return argument ofDiSCo_iter
, as well as the following additional elements:-
DiSco
-
quantile
The counterfactual quantiles for the target unit. -
weights
The optimal weights for the target unit. -
cdf
The counterfactual CDF for the target unit.
-
-
-
weights
A numeric vector containing the synthetic control weights for the control units, averaged over time. Whenmixture
is TRUE, these are the weights for the mixture of distributions, otherwise they are the weights for the quantile-based approach. -
CI
A list containing the confidence intervals for the counterfactual quantiles and CDFs, ifCI
is TRUE. Each element contains two named subelements calledupper
,lower
,se
which are the upper and lower confidence bands and the standard error of the estimate, respectively. They are G x T matrices where G is the specified number of grid points and T is the number of time periods. The elements are:-
cdf
The bootstrapped CDF -
quantile
The bootstrapped quantile -
quantile_diff
The bootstrapped quantile difference -
cdf_diff
The bootstrapped CDF difference -
bootmat
A list containing the raw bootstrapped samples for the counterfactual quantiles and CDFs, ifCI
is TRUE. These are not meant to be accessed directly, but are used byDiSCoTEA
to compute aggregated standard errors. Advanced users may wish to access these directly for further analysis. The element names should be self-explanatory. #' -
control_ids
A list containing the control unit IDs used for each time period, which can be used to identify the weights associated with each control as the returned weights have the same order as the control IDs. -
perm
Apermut
object containing the results of the permutation method, ifpermutation
is TRUE. Callsummary
on this object to print the overall results of the permutation test. #' -
evgrid
A numeric vector containing the grid points on which the quantiles were evaluated. -
params
A list containing the parameters used in the function call.
-
References
Gunsilius FF (2023).
“Distributional synthetic controls.”
Econometrica, 91(3), 1105–1117.
Van Dijcke D, Gunsilius F, Wright AL (2024).
“Return to Office and the Tenure Distribution.”
Working Paper 2024-56, University of Chicago, Becker Friedman Institute for Economics.()
Store aggregated treatment effects
Description
S3 object holding aggregated treatment effects
Usage
DiSCoT(
agg,
treats,
ses,
grid,
ci_lower,
ci_upper,
t0,
call,
cl,
N,
J,
agg_df,
perm,
plot
)
Arguments
agg |
aggregation method |
treats |
list of treatment effects |
ses |
list of standard errors |
grid |
grid |
ci_lower |
list of lower confidence intervals |
ci_upper |
list of upper confidence intervals |
t0 |
start time |
call |
call |
cl |
confidence level |
N |
number of observations |
J |
number of treated units |
agg_df |
dataframe of aggregated treatment effects and their confidence intervals |
perm |
list of per mutation results |
plot |
a ggplot object containing the plot for the aggregated treatment effects using the |
Value
S3 object of class DiSCoT
with associated summary
and print
methods
Aggregate treatment effects from DiSCo function.
Description
Function to aggregate treatment effects from the output of the DiSCo function, plot the distribution of the aggregation statistic over time, and report summary tables.
Usage
DiSCoTEA(
disco,
agg = "quantileDiff",
graph = TRUE,
t_plot = NULL,
savePlots = FALSE,
xlim = NULL,
ylim = NULL,
samples = c(0.25, 0.5, 0.75)
)
Arguments
disco |
Output of the DiSCo function. |
agg |
String indicating the aggregation statistic to be used. Options include
|
graph |
Boolean indicating whether to plot graphs (default is TRUE). |
t_plot |
Optional vector of time periods ( |
savePlots |
Boolean indicating whether to save the plots to the current working directory (default is FALSE). The plot names will be |
xlim |
Optional vector of length 2 indicating the x-axis limits of the plot. Useful for zooming in on relevant parts of the distribution for fat-tailed distributions. |
ylim |
Optional vector of length 2 indicating the y-axis limits of the plot. |
samples |
Numeric vector indicating the range of quantiles of the aggregation statistic ( |
Details
This function takes in the output of the DiSCo_per function and computes aggregate treatment effect using a user-specified aggregation statistic.
The default is the differences between the counterfactual and the observed quantile functions (quantileDiff
). If graph
is set to TRUE,
the function will plot the distribution of the aggregation statistic over time. The S3 class returned by the function
has a summary
property that will print a selection of aggregated effects (specified by the samples
parameter) for the chosen agg
method, by post-treatment year (see examples below).
This summary
call will only print effects if the agg
parameter requested a distribution difference (quantileDiff
or cdfDiff
). The other aggregations are meant to be inspected visually.
If the permutation
parameter was set to TRUE in the original DiSCo
call, the summary table will include the results of the permutation test.
If the original DiSCo
call was restricted to a range of quantiles smaller than [0,1]
(i.e. q_min
> 0 or q_max
< 1), the samples
parameter is ignored
and only the aggregated differences for the quantile range specified in the original call are returned.
Value
A DiSCoT
object, which is an S3 class that stores a list of treatment effects, their standard errors,
the corresponding confidence intervals (if specified), and a dataframe with treatment effects aggregated
according to the agg
input. The S3 class also has a summary
property that will print a selection of aggregated effects (specified by the samples
parameter)
for the chosen agg
method, by post-treatment year, as well as the permutation test results, if specified.
DiSCo_CI
Description
Function for computing the confidence intervals in the DiSCo method using the bootstrap approach described in
Usage
DiSCo_CI(
redraw,
controls,
target,
T_max,
T0,
grid,
mc.cores = 1,
evgrid = seq(from = 0, to = 1, length.out = 1001),
qmethod = NULL,
qtype = 7,
M = 1000,
mixture = FALSE,
simplex = FALSE,
replace = TRUE
)
Arguments
redraw |
Integer indicating the current bootstrap redraw |
controls |
A list containing the raw data for the control group |
target |
A list containing the raw data for the target group |
T_max |
Index of last time period |
T0 |
Index of the last pre-treatment period |
grid |
Grid to recompute the CDF on if |
mc.cores |
Number of cores to use for parallelization |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
replace |
Logical, indicating whether to sample with replacement when computing the bootstrap samples. Default is TRUE. |
Value
A list with the following components
-
weights
The bootstrapped weights -
disco_boot
A list containing the bootstrapped counterfactuals, with the following elements, each of which contains named elements calledupper
andlower
which are G x T matrices where G is the specified number of grid points and T is the number of time periods
DiSCo_CI_iter
Description
Function for computing the confidence intervals in the DiSCo method in a single period
Usage
DiSCo_CI_iter(
t,
controls_t,
target_t,
grid,
T0,
M = 1000,
evgrid = seq(from = 0, to = 1, length.out = 1001),
qmethod = NULL,
qtype = 7,
mixture = FALSE,
simplex = FALSE,
replace = TRUE
)
Arguments
t |
Time period |
controls_t |
List of control unit data for given period |
target_t |
List of target unit data for given period |
grid |
Grid to recompute the CDF on if |
T0 |
Index of the last pre-treatment period |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
replace |
Logical, indicating whether to sample with replacement when computing the bootstrap samples. Default is TRUE. |
Value
The resampled counterfactual barycenter of the target unit
Function for computing barycenters in the DiSCo method at every time period
Description
Compute barycenters in the DiSCo method at every time period, as in Definition 1, Step 4 in Gunsilius (2023).
Usage
DiSCo_bc(controls.q, weights, evgrid = seq(from = 0, to = 1, length.out = 101))
Arguments
controls.q |
List with matrices of control quantile functions |
weights |
Vector of optimal synthetic control weights, computed using the DiSCo_weights_reg function. |
Value
The quantile function of the barycenter associated with the "weights" evaluated at the vector "evgrid"
References
Gunsilius FF (2023). “Distributional synthetic controls.” Econometrica, 91(3), 1105–1117.
Estimate DiSCo in a single period
Description
This function implements the DiSCo method for a single time period, as well as the mixture of distributions approach. Its return values contain valuable period-specific estimation outputs.
Usage
DiSCo_iter(
yy,
df,
evgrid,
id_col.target,
M,
G,
T0,
qmethod = NULL,
qtype = 7,
q_min = 0,
q_max = 1,
simplex = FALSE,
controls.id,
grid.cat,
mixture
)
Arguments
yy |
Integer indicating the current year being processed. |
df |
Data frame or data table containing the distributional data for the target and control units. The data table should contain the following columns:
|
evgrid |
A vector of grid points on which to evaluate the quantile functions. |
id_col.target |
Variable indicating the name of the target unit, as specified in the id_col column of the data table. This variable can be any type, as long as it is the same type as the id_col column of the data table. |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
G |
Integer indicating the number of grid points for the grid on which the estimated functions are evaluated. Default is 1000. |
T0 |
Integer indicating the last pre-treatment period starting from 1. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
q_min |
Numeric, minimum quantile to use. Set this together with |
q_max |
Numeric, maximum quantile to use. Set this together with |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
controls.id |
List of strings specifying the column names for the control units' identifiers. |
grid.cat |
List, containing the discrete support points for a discrete grid to be used with the mixture of distributions approach. This is useful for constructing synthetic distributions for categorical variables. Default is NULL, which uses a continuous grid based on the other parameters. |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
Details
This function is part of the DiSCo method, called for each time period. It calculates the optimal weights for the DiSCo method and the mixture of distributions approach for a single time period. The function processes data f or both the target and control units, computes the quantile functions, and evaluates these on a specified grid. The function is designed to be used within the broader context of the DiSCo function, which aggregates results across multiple time periods.
Value
A list with the following elements:
-
DiSCo_weights
Weights calculated using the DiSCo method. -
mixture
-
weights
Optimal weights for the mixture approach. -
distance
Value of the objective function for the mixture approach. -
mean
Weighted mixture of the controls' CDFs.
-
-
target
-
cdf
Empirical CDF of the target. Only computed whenmixture=TRUE
. -
grid
Grid on which the quantile and CDF functions were evaluated. -
data
Original data for the target unit. -
quantiles
Quantiles for the target unit, evaluated on the specified grid.
-
-
controls
-
data
Original data for the control units. -
cdf
Empirical CDFs of the control units. Only computed whenmixture=TRUE
. -
quantiles
Quantiles for the control units, evaluated on the specified grid. .
-
-
controls.q
Quantiles for the control units, evaluated on the specified grid.
DiSCo_mixture
Description
The alternative mixture of distributions approach in the paper
Usage
DiSCo_mixture(controls1, target, grid.min, grid.max, grid.rand, M, simplex)
Arguments
controls1 |
A list of controls |
target |
The target unit |
grid.min |
Minimal value of the grid on which the CDFs are evaluated. |
grid.max |
Maximal value of the grid on which the CDFs are evaluated. |
grid.rand |
Random grid on which the CDFs are evaluated. |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
Value
A list containing the following elements:
-
cdf
A matrix containing the CDFs of the target and control units evaluated on the grid. -
distance.opt
The optimal value of the Wasserstein distance. -
mean
The optimal value of the Wasserstein barycenter. -
target.order
The target unit, ordered. -
weights.opt
The optimal weights.
DiSCo_mixture_solve
Description
The solver for the alternative mixture of distributions approach in the paper
Usage
DiSCo_mixture_solve(
c_len,
CDF.matrix,
grid.min,
grid.max,
grid.rand,
M,
simplex
)
Arguments
c_len |
The number of controls |
CDF.matrix |
The matrix of CDFs |
grid.min |
Minimal value of the grid on which the CDFs are evaluated. |
grid.max |
Maximal value of the grid on which the CDFs are evaluated. |
grid.rand |
Random grid on which the CDFs are evaluated. |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
Value
A list containing the following elements:
-
distance.opt
The optimal value of the Wasserstein distance. -
mean
The optimal value of the Wasserstein barycenter. -
target.order
The target unit, ordered. -
weights.opt
The optimal weights.
DiSCo_per
Description
Function to implement permutation test for Distributional Synthetic Controls
Usage
DiSCo_per(
results.periods,
T0,
ww = 0,
peridx = 0,
evgrid = seq(from = 0, to = 1, length.out = 101),
graph = TRUE,
num.cores = 1,
weights = NULL,
qmethod = NULL,
qtype = qtype,
q_min = 0,
q_max = 1,
M = 1000,
simplex = FALSE,
mixture = FALSE
)
Arguments
results.periods |
List of period-specific results from DiSCo |
T0 |
Integer indicating first year of treatment as counted from 1 (e.g, if treatment year 2002 was the 5th year in the sample, this parameter should be 5). |
ww |
Optional vector of weights indicating the relative importance of each time period. If not specified, each time period is weighted equally. |
peridx |
Optional integer indicating number of permutations. If not specified, by default equal to the number of units in the sample. |
graph |
Logical, indicating whether to plot the permutation graph as in Figure 3 of the paper. Default is FALSE. |
num.cores |
Integer, number of cores to use for parallel computation. Default is 1. If the |
weights |
Optional vector of weights to use for the "true" treated unit. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
q_min |
Numeric, minimum quantile to use. Set this together with |
q_max |
Numeric, maximum quantile to use. Set this together with |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
Details
This program iterates through all units and computes the optimal weights on the other units for replicating the unit of iteration's outcome variable, assuming that it is the treated unit. See Algorithm 1 in Gunsilius (2023) for more details. The only modification is that we take the ratio of post- and pre-treatment root mean squared Wasserstein distances to calculate the p-value, rather than the level in each period, following @abadie2010synthetic.
Value
List of matrices containing synthetic time path of the outcome variable for the target unit together with the time paths of the control units
References
Gunsilius FF (2023). “Distributional synthetic controls.” Econometrica, 91(3), 1105–1117.
DiSCo_per_iter
Description
This function performs one iteration of the permutation test
Usage
DiSCo_per_iter(
c_df,
c_df.q,
t_df,
T0,
peridx,
evgrid,
idx,
grid_df,
M = 1000,
ww = 0,
qmethod = NULL,
qtype = 7,
q_min = 0,
q_max = 1,
simplex = FALSE,
mixture = FALSE
)
Arguments
c_df |
List of control units |
c_df.q |
List of quantiles of control units |
t_df |
List of target unit |
idx |
Index of permuted target unit |
grid_df |
Grids to evaluate CDFs on, only needed when |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
q_min |
Numeric, minimum quantile to use. Set this together with |
q_max |
Numeric, maximum quantile to use. Set this together with |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
Value
List of squared Wasserstein distances between the target unit and the control units
DiSCo_per_rank
Description
This function ranks the squared Wasserstein distances and returns the p-values for each time period
Usage
DiSCo_per_rank(distt, distp, T0)
Arguments
distt |
List of squared Wasserstein distances between the target unit and the control units |
distp |
List of squared Wasserstein distances between the control units |
Value
List of p-values for each time period
DiSCo_weights_reg
Description
Function for obtaining the weights in the DiSCo method at every time period
Usage
DiSCo_weights_reg(
controls,
target,
M = 500,
qmethod = NULL,
qtype = 7,
simplex = FALSE,
q_min = 0,
q_max = 1
)
Arguments
controls |
List with matrices of control distributions |
target |
Matrix containing the target distribution |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
qtype |
Integer, indicating the type of quantile to compute when using |
simplex |
Logical, indicating whether to use to constrain the optimal weights to the unit simplex. Default is FALSE, which only constrains the weights to sum up to 1 but allows them to be negative. |
q_min |
Numeric, minimum quantile to use. Set this together with |
q_max |
Numeric, maximum quantile to use. Set this together with |
Details
Estimate the optimal weights for the distributional synthetic controls method.
solving the convex minimization problem in Eq. (2) in Gunsilius (2023)..
using a regression of the simulated target quantile on the simulated control quantiles, as in Eq. (3),
\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}}\left\|\mathbb{Y}_t \vec{\lambda}_t-\vec{Y}_{1 t}\right\|_2^2
.
For the constrained optimization we rely on the package pracma
the control distributions can be given in list form, where each list element contains a
vector of observations for the given control unit, in matrix form;
in matrix- each column corresponds to one unit and each row is one observation.
The list-form is useful, because the number of draws for each control group can be different.
The target must be given as a vector.
Value
Vector of optimal synthetic control weights
References
Gunsilius FF (2023). “Distributional synthetic controls.” Econometrica, 91(3), 1105–1117.
bootCounterfactuals
Description
Function for computing the bootstrapped counterfactuals in the DiSCo method
Usage
bootCounterfactuals(result_t, t, mixture, weights, evgrid, grid)
Arguments
result_t |
A list containing the results of the DiSCo_CI_iter function |
t |
The current time period |
mixture |
Logical, indicating whether to use the mixture of distributions approach instead.
See Section 4.3. in Gunsilius (2023). This approach minimizes the distance between the CDFs
instead of the quantile functions, and is preferred for categorical variables. When working with such variables, one should
also provide a list of support points in the |
grid |
Grid to recompute the CDF on if |
Value
A list containing the bootstrapped counterfactuals
checks Carry out checks on the inputs
Description
checks Carry out checks on the inputs
Usage
checks(
df,
id_col.target,
t0,
M,
G,
num.cores,
permutation,
q_min,
q_max,
CI,
boots,
cl,
graph,
qmethod,
seed
)
Arguments
df |
Data frame or data table containing the distributional data for the target and control units. The data table should contain the following columns:
|
id_col.target |
Variable indicating the name of the target unit, as specified in the id_col column of the data table. This variable can be any type, as long as it is the same type as the id_col column of the data table. |
t0 |
Integer indicating period of treatment. |
M |
Integer indicating the number of control quantiles to use in the DiSCo method. Default is 1000. |
G |
Integer indicating the number of grid points for the grid on which the estimated functions are evaluated. Default is 1000. |
num.cores |
Integer, number of cores to use for parallel computation. Default is 1. If the |
permutation |
logical, whether to use permutation or not |
q_min |
Numeric, minimum quantile to use. Set this together with |
q_max |
Numeric, maximum quantile to use. Set this together with |
CI |
Logical, indicating whether to compute confidence intervals for the counterfactual quantiles. Default is FALSE. The confidence intervals are computed using the bootstrap procedure described in Van Dijcke et al. (2024). |
boots |
Integer, number of bootstrap samples to use for computing confidence intervals. Default is 500. |
cl |
Numeric, confidence level for the (two-sided) confidence intervals. |
graph |
Logical, indicating whether to plot the permutation graph as in Figure 3 of the paper. Default is FALSE. |
qmethod |
Character, indicating the method to use for computing the quantiles of the target distribution. The default is NULL, which uses the |
seed |
Integer, seed for the random number generator. This needs to be set explicitly in the function call, since it will invoke |
citation
Description
print the citation for the relevant paper
Usage
citation()
Data from (Dube 2019)
Description
As used in the empirical application of Gunsilius (2023).
Usage
dube
Format
dube
A data frame with 652,870 rows and 3 columns:
- id_col
State FIPS
- time_col
Year
- y_col
adj0contpov
variable in Dube (2019). Captures the distribution of equalized family income from wages and salary, defined as multiples of the federal poverty threshold.
...
ex_gmm
Description
Example data for DiSCo
command.
Returns simulated target and control that are mixtures of Gaussian distributions.
Usage
ex_gmm(Ts = 2, num.con = 30, numdraws = 1000)
Arguments
Ts |
an integer indicating the number of time periods |
num.con |
an integer indicating the number of control units |
numdraws |
an integer indicating the number of draws |
Value
target |
a vector. |
control |
a matrix. |
getGrid
Description
Set up a grid for the estimation of the quantile functions and CDFs
Usage
getGrid(target, controls, G)
Arguments
target |
A vector containing the data for the target unit |
controls |
A list containing the data for the control units |
G |
The number of grid points |
Value
A list containing the following elements:
-
grid.min
The minimum value of the grid -
grid.max
The maximum value of the grid -
grid.rand
A vector containing the grid points -
grid.ord
A vector containing the grid points, ordered
Check if a vector is integer
Description
Check if a vector is integer
Usage
is.integer(x)
Arguments
x |
a vector |
Value
TRUE if x is integer, FALSE otherwise
mclapply.hack
Description
This function mimics forking (done with mclapply in Mac or Linux) for the Windows environment. Designed to be used just like mclapply. Credit goes to Nathan VanHoudnos.
Usage
mclapply.hack(..., verbose = FALSE, mc.cores = 1)
Arguments
verbose |
Should users be warned this is hack-y? Defaults to FALSE. |
mc.cores |
Number of cores to use. Defaults to 1. |
See Also
mclapply
Compute the empirical quantile function
Description
Compute the empirical quantile function
Usage
myQuant(X, q, qtype = 7, qmethod = NULL, ...)
Arguments
X |
A vector containing the data |
q |
A vector containing the quantiles |
Value
A vector containing the empirical quantile function
parseBoots
Description
Function for parsing the bootstrapped counterfactuals in the DiSCo method
Usage
parseBoots(CI_temp, cl, q_disco, cdf_disco, q_obs, cdf_obs, uniform = TRUE)
Arguments
CI_temp |
A list containing the bootstrapped counterfactuals |
cl |
The confidence level |
q_disco |
The estimated quantiles around which to center |
cdf_disco |
The estimated cdfs around which to center |
q_obs |
The observed quantiles |
cdf_obs |
The observed cdfs |
uniform |
Whether to use uniform or pointwise confidence intervals |
Value
A list containing the confidence intervals for the quantiles and cdfs
permut
Description
Object to hold results of permutation test
Usage
permut(distp, distt, p_overall, J_1, q_min, q_max, plot)
Arguments
distp |
List of squared Wasserstein distances between the control units |
distt |
List of squared Wasserstein distances between the target unit and the control units |
p_overall |
Overall p-value |
J_1 |
Number of control units |
q_min |
Minimum quantile |
q_max |
Maximum quantile |
plot |
ggplot object containing plot of squared Wasserstein distances over time for all permutations. |
Value
A list of class permut, with the same elements as the input arguments.
Plot distribution of treatment effects over time
Description
Plot distribution of treatment effects over time
Usage
plotDistOverTime(
cdf_centered,
grid_cdf,
t_start,
t_max,
CI,
ci_lower,
ci_upper,
ylim = c(0, 1),
xlim = NULL,
cdf = TRUE,
xlab = "Distribution Difference",
ylab = "CDF",
obsLine = NULL,
savePlots = FALSE,
plotName = NULL,
lty = 1,
lty_obs = 1,
t_plot = NULL
)
Arguments
cdf_centered |
list of centered distributional statistics |
grid_cdf |
grid |
t_start |
start time |
t_max |
maximum time |
CI |
logical indicating whether to plot confidence intervals |
ci_lower |
lower confidence interval |
ci_upper |
upper confidence interval |
ylim |
y limits |
xlim |
x limits |
cdf |
logical indicating whether to plot CDF or quantile difference |
xlab |
x label |
ylab |
y label |
obsLine |
optional additional line to plot. Default is NULL which means no line is plotted. |
savePlots |
logical indicating whether to save plots |
plotName |
name of plot to save |
lty |
line type for the main line passed as cdf_centered |
lty_obs |
line type for the optional additional line passed as obsLine |
t_plot |
optional vector of times to plot. Default is NULL which means all times are plotted. |
Value
plot of distribution of treatment effects over time
print.permut
Description
Print permutation test results
Usage
## S3 method for class 'permut'
print(x, ...)
Arguments
x |
Object of class permut |
... |
Additional arguments |
Value
Prints permutation test results
summary.DiSCoT
Description
Summary of DiSCoT object
Usage
## S3 method for class 'DiSCoT'
summary(object, ...)
Arguments
object |
DiSCoT object |
... |
Additional arguments |
Value
summary of DiSCoT object
summary.permut
Description
Summarize permutation test results
Usage
## S3 method for class 'permut'
summary(object, ...)
Arguments
object |
Object of class permut |
... |
Additional arguments |
Value
Prints permutation test results