Type: | Package |
Title: | Vector Look-Ups and Safer Sampling |
Version: | 0.2.3 |
Author: | Magnus Thor Torfason |
Maintainer: | Magnus Thor Torfason <m@zulutime.net> |
Description: | A collection of utility functions that facilitate looking up vector values from a lookup table, annotate values in at table for clearer viewing, and support a safer approach to vector sampling, sequence generation, and aggregation. |
License: | MIT + file LICENSE |
URL: | https://github.com/torfason/zmisc/, https://torfason.github.io/zmisc/ |
Suggests: | desc, dplyr, haven, knitr, labelled, purrr, rmarkdown, roxygen2, rprojroot, stringr, testthat, tibble |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2023-08-22 08:08:10 UTC; magnus |
Repository: | CRAN |
Date/Publication: | 2023-08-22 11:30:02 UTC |
zmisc: Vector Look-Ups and Safer Sampling
Description
A collection of utility functions that facilitate looking up vector values from a lookup table, annotate values in at table for clearer viewing, and support a safer approach to vector sampling, sequence generation, and aggregation.
For more information, see vignette("zmisc").
See Also
Useful links:
Apply a function to each column of a data.frame
Description
Thin wrapper around lapply()
that checks that the input is a table before
applying the function to each column, and converts the result back to a table
afterwards. If the tibble
package is available and the input is a tibble
,
the result will be a tibble
; otherwise, it will be a plain data.frame
.
Usage
ddply_helper(d, fun)
Arguments
d |
A |
fun |
A function to apply to each column of |
Value
A data.frame
or tibble
with the function applied to each column.
Examples
df <- data.frame(
col1 = c(1, 2, 3),
col2 = c(4, 5, 6)
)
sum_fun <- function(x) sum(x)
result <- ddply_helper(df, sum_fun)
print(result)
Verify that x is a valid labelled variable
Description
Verify that x is a valid labelled variable satisfying the (minimal)
specification inherent in the parameter documentation of the
haven::labelled()
function for haven_labelled
objects.
Usage
ll_assert_labelled(x)
Arguments
x |
A labelled variable |
Value
Invisibly returns x if the check is successful.
See Also
Other labelled light:
ll_labelled()
,
ll_to_character()
,
ll_val_labels()
,
ll_var_label()
,
threadbare()
Create a labelled variable
Description
The labelled_light (ll) collection is a minimal implementation of core
functions for creating and managing haven_labelled
variables, and with
minimal dependencies. These functions, prefixed with ll_
rely only on base
R, and operate only on objects of type haven_labelled
. All functions check
internally that the variables have the correct class and the correct
structure for labelled variables, satisfying the (minimal) specification
inherent in the parameter documentation of the haven::labelled()
function.
The constructor, ll_labelled()
, creates a labelled variable satisfying that
specification.
Usage
ll_labelled(x = double(), labels = NULL, label = NULL)
Arguments
x |
A vector to label. Must be either numeric (integer or double) or character. |
labels |
A named vector or |
label |
A short, human-readable description of the vector. |
Value
A valid labelled variable.
See Also
Other labelled light:
ll_assert_labelled()
,
ll_to_character()
,
ll_val_labels()
,
ll_var_label()
,
threadbare()
Get the character representation of a labelled variable
Description
Returns a character representation of a labelled variable, using the value labels to look up the label for a given value.
The default behavior of this function is similar to
labelled::to_character()
. The options, however, are slightly different.
Most importantly, instead of specifying NA
handling using parameters, the
function relies on the default
parameter to determine what happens for
unlabelled variables, allowing users to specify including the original values
of x
instead of the labels, returning NA
, or returning a specific string
value. Also, the default behavior is to drop any variable label attribute, in
line with the default as.character()
method.
Usage
ll_to_character(x, default = x, preserve_var_label = FALSE)
Arguments
x |
A labelled variable |
default |
Vector providing a default label for any values not found in
the |
preserve_var_label |
Should any |
See Also
Other labelled light:
ll_assert_labelled()
,
ll_labelled()
,
ll_val_labels()
,
ll_var_label()
,
threadbare()
Get or set value labels of a labelled variable
Description
Gets or sets the value labels (labels
attribute) of a labelled vector. The
getters/setters should be used rather than manipulating attributes directly,
since these functions perform checks to ensure that the result, and the
resulting labelled variable, are valid.
Usage
ll_val_labels(x, always = FALSE)
ll_val_labels(x) <- value
Arguments
x |
A labelled variable |
always |
Always return at least an empty vector of the correct type, even if the attribute is not set. |
See Also
Other labelled light:
ll_assert_labelled()
,
ll_labelled()
,
ll_to_character()
,
ll_var_label()
,
threadbare()
Get or set variable label of a labelled variable
Description
Gets or sets the variable label (label
attribute) of a labelled vector. The
getters/setters should be used rather than manipulating attributes directly,
since these functions perform checks to ensure that the result, and the
resulting labelled variable, are valid.
Usage
ll_var_label(x)
ll_var_label(x) <- value
Arguments
x |
A labelled variable |
See Also
Other labelled light:
ll_assert_labelled()
,
ll_labelled()
,
ll_to_character()
,
ll_val_labels()
,
threadbare()
Lookup values from a lookup table
Description
The lookup()
function implements lookup of certain strings (such as
variable names) from a lookup table which maps keys onto values (such as
variable labels or descriptions).
The lookup table can be in the form of a two-column data.frame
, in the form
of a named vector
, or in the form of a list
. If the table is in the form
of a data.frame
, the lookup columns should be named name
(for the key)
and value
(for the value). If the lookup table is in the form of a named
vector
or list
, the name is used for the key, and the returned value is
taken from the values in the vector or list.
Original values are returned if they are not found in the lookup table.
Alternatively, a default
can be specified for values that are not found.
Note that an NA
in x will never be found and will be replaced with the
default value. To specify different defaults for values that are not found
and for NA
values in x
, the default
must be crafted manually to achieve
this.
Any names in x are not included in the result.
The lookuper()
function returns a function equivalent to the lookup()
function, except that instead of taking a lookup table as an argument, the
lookup table is embedded in the function itself.
This can be very useful, in particular when using the lookup function as an
argument to other functions that expect a function which maps
character
->character
but do not offer a good way to pass additional
arguments to that function.
Usage
lookup(x, lookup_table, default = x)
lookuper(lookup_table, default = NULL)
Arguments
x |
A string vector whose elements are to be looked up. |
lookup_table |
The lookup table to use. |
default |
If a value is not found in the lookup table, the value will be
taken from |
Value
The lookup()
function returns string vector based on x
, with
values replaced with the lookup values from lookup_table
. Any values not
found in the lookup table are taken from default
.
The lookuper()
function returns a function that takes character
vectors as its argument x
, and returns either the corresponding values
from the underlying lookup table, or the original values from x for those
elements that are not found in the lookup table (or looks them up from the
default
).
Examples
fruit_lookup_vector <- c(a="Apple", b="Banana", c="Cherry")
lookup(letters[1:5], fruit_lookup_vector)
lookup(letters[1:5], fruit_lookup_vector, default = NA)
mtcars_lookup_data_frame <- data.frame(
name = c("mpg", "hp", "wt"),
value = c("Miles/(US) gallon", "Gross horsepower", "Weight (1000 lbs)"))
lookup(names(mtcars), mtcars_lookup_data_frame)
lookup_fruits <- lookuper(list(a="Apple", b="Banana", c="Cherry"))
lookup_fruits(letters[1:5])
Embed factor levels and value labels in values.
Description
This function adds level/label information as an annotation to either factors
or labelled
variables. This function is called notate()
rather than
annotate()
to avoid conflict with ggplot2::annotate()
. It is a generic that
can operate either on individual vectors or on a data.frame
.
When printing labelled
variables from a tibble
in a console, both the
numeric value and the text label are shown, but no variable labels. When
using the View()
function, only variable labels are shown but no value
labels. For factors, there is no way to view the integer levels and values at
the same time.
In order to allow the viewing of both variable and value labels at the same
time, this function converts both factor
and labelled
variables to
character
, including both numeric levels (labelled
values) and character
values (labelled
labels) in the output.
Usage
notate(x)
Arguments
x |
The object (either vector or |
Value
The processed data.frame
, suitable for viewing, in particular
through the View()
function.
Examples
d <- data.frame(
chr = letters[1:4],
fct = factor(c("alpha", "bravo", "chrly", "delta")),
lbl = ll_labelled(c(1, 2, 3, NA),
labels = c(one=1, two=2),
label = "A labelled vector")
)
dn <- notate(d)
dn
# View(dn)
Helper function to standardize the lookup_table
.
Description
Preprocessing the lookup table to convert it to a list can take some time, so when possible, we want to do it only once. Therefore we offload it to a helper function
Usage
standardize_lookup_table(lookup_table)
Arguments
lookup_table |
The unstandardized lookup table (must still be one of the
formats specified for the |
Value
The lookup table as a list.
Return a threadbare version of a vector
Description
A bare object is an R object that has no class attributes (see
rlang::is_bare_character()
). A threadbare object is an atomic object (i.e.
not a list()
, see is.atomic()
), with no attributes at all. The function
returns an error if a list is passed.
Usage
threadbare(x)
Arguments
x |
A vector, possibly classed, but not a list object, to strip of all attributes. |
Value
A vector with the same core values as x
, but with no attributes()
at all, not even names()
.
See Also
Other labelled light:
ll_assert_labelled()
,
ll_labelled()
,
ll_to_character()
,
ll_val_labels()
,
ll_var_label()
Utility function to output an error
Description
This function is used to capture errors, typically inside a tryCatch()
statement and output them in a clean and readable way. The function provides
line-wrapping, with a configurable width. When printing the error message, it
prefixes the text with "#E>
" to make it easier to look for the error.
Usage
wrap_error(e, wrap = 50)
Arguments
e |
The error to wrap. |
wrap |
How many characters per line before wrapping. |
Value
The original error is returned invisibly.
Examples
tryCatch(stop("This is an error"), error=wrap_error)
Sample from a vector in a safe way
Description
The zample()
function duplicates the functionality of sample()
, with the
exception that it does not attempt the (sometimes dangerous)
user-friendliness of switching the interpretation of the first element to a
number if the length of the vector is 1. zample()
always treats its first
argument as a vector containing elements that should be sampled, so your code
won't break in unexpected ways when the input vector happens to be of length
1.
Usage
zample(x, size = length(x), replace = FALSE, prob = NULL)
Arguments
x |
The vector to sample from |
size |
The number of elements to sample from |
replace |
Should elements be replaced after sampling (defaults to |
prob |
A vector of probability weights (defaults to equal probabilities) |
Details
If what you really want is to sample from an interval between 1 and n, you can
use sample(n)
or sample.int(n)
(but make sure to only pass vectors of
length one to those functions).
Value
The resulting sample
Examples
# For vectors of length 2 or more, zample() and sample() are identical
set.seed(42); zample(7:11)
set.seed(42); sample(7:11)
# For vectors of length 1, zample() will still sample from the vector,
# whereas sample() will "magically" switch to interpreting the input
# as a number n, and sampling from the vector 1:n.
set.seed(42); zample(7)
set.seed(42); sample(7)
# The other arguments work in the same way as for sample()
set.seed(42); zample(7:11, size=13, replace=TRUE, prob=(5:1)^3)
set.seed(42); sample(7:11, size=13, replace=TRUE, prob=(5:1)^3)
# Of course, sampling more than the available elements without
# setting replace=TRUE will result in an error
set.seed(42); tryCatch(zample(7, size=2), error=wrap_error)
Generate sequence in a safe way
Description
The zeq()
function creates an increasing integer sequence, but differs from
the standard one in that it will not silently generate a decreasing sequence
when the second argument is smaller than the first. If the second argument is
one smaller than the first it will generate an empty sequence, if the
difference is greater, the function will throw an error.
Usage
zeq(from, to)
Arguments
from |
The lower bound of the sequence |
to |
The higher bound of the sequence |
Value
A sequence ranging from from
to to
Examples
# For increasing sequences, zeq() and seq() are identical
zeq(11,15)
zeq(11,11)
# If second argument equals first-1, an empty sequence is returned
zeq(11,10)
# If second argument is less than first-1, the function throws an error
tryCatch(zeq(11,9), error=wrap_error)
Return the single (unique) value found in a vector
Description
The zingle()
function returns the first element in a vector, but only if
all the other elements are identical to the first one (the vector only has a
zingle
value). If the elements are not all identical, it throws an error.
The vector must contain at least one non-NA
value, or the function errors
out as well. This is especially useful in aggregations, when all values in a
given group should be identical, but you want to make sure.
Usage
zingle(x, na.rm = FALSE)
Arguments
x |
Vector of elements that should all be identical |
na.rm |
Should |
Details
Optionally takes a na.rm
parameter, similarly to sum, mean and other
aggregate functions. If TRUE
, NA
values will be removed prior to
comparing the elements, so the function will accept input values that contain
a combination of the single value and any NA
values (but at least one
non-NA
value is required).
Only values are tested for equality. Any names are simply ignored, and the result is an unnamed value. This is in line with how other aggregation functions handle names.
Value
The zingle
element in the vector
Examples
# If all elements are identical, all is good.
# The value of the element is returned.
zingle(c("Alpha", "Alpha", "Alpha"))
# If any elements differ, an error is thrown
tryCatch(zingle(c("Alpha", "Beta", "Alpha")), error=wrap_error)
if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) {
d <- tibble::tribble(
~id, ~name, ~fouls,
1, "James", 3,
2, "Jack", 2,
1, "James", 4
)
# If the data is of the correct format, all is good
d %>%
dplyr::group_by(id) %>%
dplyr::summarise(name=zingle(name), total_fouls=sum(fouls))
}
if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) {
# If a name does not match its ID, we should get an error
d[1,"name"] <- "Jammes"
tryCatch({
d %>%
dplyr::group_by(id) %>%
dplyr::summarise(name=zingle(name), total_fouls=sum(fouls))
}, error=wrap_error)
}