Version: | 1.1.1 |
Title: | Diff, Patch and Merge for Data.frames |
Description: | Diff, patch and merge for data frames. Document changes in data sets and use them to apply patches. Changes to data can be made visible by using render_diff(). The 'V8' package is used to wrap the 'daff.js' 'JavaScript' library which is included in the package. |
License: | MIT + file LICENSE |
Imports: | V8 (≥ 0.6), jsonlite, utils |
URL: | https://github.com/edwindj/daff |
Suggests: | testthat |
RoxygenNote: | 7.2.3 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2024-02-15 07:44:21 UTC; edwin |
Author: | Paul Fitzpatrick [aut] (JavaScript original,
http://paulfitz.github.io/daff/),
Edwin de Jonge |
Maintainer: | Edwin de Jonge <edwindjonge@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-02-15 09:40:02 UTC |
Data diff, patch and merge for R
Description
Daff calculates differences between two data.frame
s. This difference can be stored and later used to
patch the original data. Differences can also be made visual by using render_diff
showing what changed.
Details
Storing the difference between data sets allows for tracking or incorporating manual changes to data sets.
Ideally changes to data should be scripted to be reproducable, but there are situations or scenario's where
this is not possible or happens out of your control. daff
can help track these changes.
actions
diff_data | Find differences in values between data.frame s |
patch_data | Apply a patch generated with diff_data to a data.frame |
merge_data | Merge two diverged data.frame s orginating from a same parent
|
daff.js
Daff wraps the daff.js library which offers more functionality.
Do a data diff
Description
Find differences with a reference data set. The diff can be used to patch_data
, to store the difference
for documentation purposes using write_diff
or to visualize the difference using render_diff
Usage
diff_data(
data_ref,
data,
always_show_header = TRUE,
always_show_order = FALSE,
columns_to_ignore = c(),
count_like_a_spreadsheet = TRUE,
ids = c(),
ignore_whitespace = FALSE,
never_show_order = FALSE,
ordered = TRUE,
padding_strategy = c("auto", "smart", "dense", "sparse"),
show_meta = TRUE,
show_unchanged = FALSE,
show_unchanged_columns = FALSE,
show_unchanged_meta = FALSE,
unchanged_column_context = 1L,
unchanged_context = 1L
)
Arguments
data_ref |
|
data |
|
always_show_header |
|
always_show_order |
|
columns_to_ignore |
|
count_like_a_spreadsheet |
|
ids |
|
ignore_whitespace |
|
never_show_order |
|
ordered |
|
padding_strategy |
|
show_meta |
|
show_unchanged |
|
show_unchanged_columns |
|
show_unchanged_meta |
|
unchanged_column_context |
|
unchanged_context |
|
Value
difference object
See Also
differs_from
Examples
library(daff)
x <- iris
x[1,1] <- 10
diff_data(x, iris)
dd <- diff_data(x, iris)
#write_diff(dd, "diff.csv")
summary(dd)
differs from,
Description
This is the same function as diff_data
but with arguments
reversed. This is more useful when using dplyr
and magrittr
Usage
differs_from(data, data_ref, ...)
Arguments
data |
|
data_ref |
|
... |
not further specified |
Value
difference object
See Also
diff_data
Merge two tables based on a parent version
Description
merge_data
provides a three-way merge: suppose two versions are based on a common
version, this function will merge tables a
and b
.
Usage
merge_data(parent, a, b)
Arguments
parent |
|
a |
|
b |
|
Details
If both a
and b
change the same table cell with a different value, this results in a
conflict. In that case a warning will be generated with the number of conflicts.
In the returned data.frame
of a conflicting merge columns with conflicting values are of type
character
and contain all three values coded as
(parent) a /// b
Value
merged data.frame
. When a merge has conflicts the columns of conflicting changes
are of type character
and contain all three values.
See Also
Examples
parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[2,1] <- 11
# succesful merge
merge_data(parent, a, b)
parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[1,1] <- 11
# conflicting merge (both a and b change same cell)
merged <- merge_data(parent, a, b)
merged #note the conflict
#find out which rows contain a conflict
which_conflicts(merged)
patch data
Description
Patch data with a diff generated by diff_data
Usage
patch_data(data, patch)
Arguments
data |
|
patch |
generated with diff_data |
Value
data.frame
that has been patched.
Examples
library(daff)
x <- iris
#change a value
x[1,1] <- 1000
patch <- diff_data(iris, x)
print(patch)
# apply patch
iris_patched <- patch_data(iris, patch)
iris_patched[1,1] == 1000
Render a data_diff to html
Description
Converts a diff_data object to HTML code, and opens the resulting HTML code
in a browser window if view==TRUE
and R is running interactively.
Usage
render_diff(
diff,
file = tempfile(fileext = ".html"),
view = interactive(),
fragment = FALSE,
pretty = TRUE,
title,
summary = !fragment,
use.DataTables = !fragment
)
Arguments
diff |
|
file |
|
view |
|
fragment |
|
pretty |
|
title |
|
summary |
|
use.DataTables |
|
Value
generated html
See Also
data_diff
Examples
y <- iris[1:3,]
x <- y
x <- head(x,2) # remove a row
x[1,1] <- 10 # change a value
x$hello <- "world" # add a column
x$Species <- NULL # remove a column
patch <- diff_data(y, x)
render_diff(patch, title="compare x and y", pretty = TRUE)
#apply patch
y_patched <- patch_data(y, patch)
return which rows of a merged data.frame
contain conflicts
Description
return which rows of a merged data.frame
contain conflicts.
Usage
which_conflicts(merged)
Arguments
merged |
|
Value
integer
vector with row positions containing conflicts.
See Also
Examples
parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[2,1] <- 11
# succesful merge
merge_data(parent, a, b)
parent <- a <- b <- iris[1:3,]
a[1,1] <- 10
b[1,1] <- 11
# conflicting merge (both a and b change same cell)
merged <- merge_data(parent, a, b)
merged #note the conflict
#find out which rows contain a conflict
which_conflicts(merged)
Write or read a diff to or from a file
Description
The diff information is stored in the Coopy highlighter diff format: https://paulfitz.github.io/daff-doc/spec.html
Usage
write_diff(diff, file = "diff.csv")
read_diff(file)
Arguments
diff |
generated with diff_data |
file |
filename or connection |
Details
Note that type information of the target data.frame is lost when writing a patch to disk.
Using a stored diff to patch a data.frame
will use the column types of the source
data.frame
to determine the target column types. New introduced columns may become characters
.
Names of the reference and comparison dataset are also lost when writing a data_diff object to disk.
Value
diff object that can be used in patch_data