Type: Package
Title: Vector Look-Ups and Safer Sampling
Version: 0.2.3
Author: Magnus Thor Torfason
Maintainer: Magnus Thor Torfason <m@zulutime.net>
Description: A collection of utility functions that facilitate looking up vector values from a lookup table, annotate values in at table for clearer viewing, and support a safer approach to vector sampling, sequence generation, and aggregation.
License: MIT + file LICENSE
URL: https://github.com/torfason/zmisc/, https://torfason.github.io/zmisc/
Suggests: desc, dplyr, haven, knitr, labelled, purrr, rmarkdown, roxygen2, rprojroot, stringr, testthat, tibble
VignetteBuilder: knitr
Encoding: UTF-8
Language: en-US
RoxygenNote: 7.2.3
NeedsCompilation: no
Packaged: 2023-08-22 08:08:10 UTC; magnus
Repository: CRAN
Date/Publication: 2023-08-22 11:30:02 UTC

zmisc: Vector Look-Ups and Safer Sampling

Description

A collection of utility functions that facilitate looking up vector values from a lookup table, annotate values in at table for clearer viewing, and support a safer approach to vector sampling, sequence generation, and aggregation.

For more information, see vignette("zmisc").

See Also

Useful links:


Apply a function to each column of a data.frame

Description

Thin wrapper around lapply() that checks that the input is a table before applying the function to each column, and converts the result back to a table afterwards. If the tibble package is available and the input is a tibble, the result will be a tibble; otherwise, it will be a plain data.frame.

Usage

ddply_helper(d, fun)

Arguments

d

A data.frame or tibble.

fun

A function to apply to each column of d.

Value

A data.frame or tibble with the function applied to each column.

Examples

df <- data.frame(
  col1 = c(1, 2, 3),
  col2 = c(4, 5, 6)
)
sum_fun <- function(x) sum(x)
result <- ddply_helper(df, sum_fun)
print(result)


Verify that x is a valid labelled variable

Description

Verify that x is a valid labelled variable satisfying the (minimal) specification inherent in the parameter documentation of the haven::labelled() function for haven_labelled objects.

Usage

ll_assert_labelled(x)

Arguments

x

A labelled variable

Value

Invisibly returns x if the check is successful.

See Also

Other labelled light: ll_labelled(), ll_to_character(), ll_val_labels(), ll_var_label(), threadbare()


Create a labelled variable

Description

The labelled_light (ll) collection is a minimal implementation of core functions for creating and managing haven_labelled variables, and with minimal dependencies. These functions, prefixed with ll_ rely only on base R, and operate only on objects of type haven_labelled. All functions check internally that the variables have the correct class and the correct structure for labelled variables, satisfying the (minimal) specification inherent in the parameter documentation of the haven::labelled() function.

The constructor, ll_labelled(), creates a labelled variable satisfying that specification.

Usage

ll_labelled(x = double(), labels = NULL, label = NULL)

Arguments

x

A vector to label. Must be either numeric (integer or double) or character.

labels

A named vector or NULL. The vector should be the same type as x. Unlike factors, labels don't need to be exhaustive: only a fraction of the values might be labelled.

label

A short, human-readable description of the vector.

Value

A valid labelled variable.

See Also

Other labelled light: ll_assert_labelled(), ll_to_character(), ll_val_labels(), ll_var_label(), threadbare()


Get the character representation of a labelled variable

Description

Returns a character representation of a labelled variable, using the value labels to look up the label for a given value.

The default behavior of this function is similar to labelled::to_character(). The options, however, are slightly different. Most importantly, instead of specifying NA handling using parameters, the function relies on the default parameter to determine what happens for unlabelled variables, allowing users to specify including the original values of x instead of the labels, returning NA, or returning a specific string value. Also, the default behavior is to drop any variable label attribute, in line with the default as.character() method.

Usage

ll_to_character(x, default = x, preserve_var_label = FALSE)

Arguments

x

A labelled variable

default

Vector providing a default label for any values not found in the val_labels (unlabelled values). Must be of length 1 or of the same length as x. Useful possibilities are x (use values where labels are not found), NA (return NA for such values), and "" (an empty string). Missing (NA) values in x, however, are never replaced with the default, they remain NA.

preserve_var_label

Should any var_label in x be preserved, or should they be dropped from the result (ensuring that the result is bare and without any attributes).

See Also

Other labelled light: ll_assert_labelled(), ll_labelled(), ll_val_labels(), ll_var_label(), threadbare()


Get or set value labels of a labelled variable

Description

Gets or sets the value labels (labels attribute) of a labelled vector. The getters/setters should be used rather than manipulating attributes directly, since these functions perform checks to ensure that the result, and the resulting labelled variable, are valid.

Usage

ll_val_labels(x, always = FALSE)

ll_val_labels(x) <- value

Arguments

x

A labelled variable

always

Always return at least an empty vector of the correct type, even if the attribute is not set.

See Also

Other labelled light: ll_assert_labelled(), ll_labelled(), ll_to_character(), ll_var_label(), threadbare()


Get or set variable label of a labelled variable

Description

Gets or sets the variable label (label attribute) of a labelled vector. The getters/setters should be used rather than manipulating attributes directly, since these functions perform checks to ensure that the result, and the resulting labelled variable, are valid.

Usage

ll_var_label(x)

ll_var_label(x) <- value

Arguments

x

A labelled variable

See Also

Other labelled light: ll_assert_labelled(), ll_labelled(), ll_to_character(), ll_val_labels(), threadbare()


Lookup values from a lookup table

Description

The lookup() function implements lookup of certain strings (such as variable names) from a lookup table which maps keys onto values (such as variable labels or descriptions).

The lookup table can be in the form of a two-column data.frame, in the form of a named vector, or in the form of a list. If the table is in the form of a data.frame, the lookup columns should be named name (for the key) and value (for the value). If the lookup table is in the form of a named vector or list, the name is used for the key, and the returned value is taken from the values in the vector or list.

Original values are returned if they are not found in the lookup table. Alternatively, a default can be specified for values that are not found. Note that an NA in x will never be found and will be replaced with the default value. To specify different defaults for values that are not found and for NA values in x, the default must be crafted manually to achieve this.

Any names in x are not included in the result.

The lookuper() function returns a function equivalent to the lookup() function, except that instead of taking a lookup table as an argument, the lookup table is embedded in the function itself.

This can be very useful, in particular when using the lookup function as an argument to other functions that expect a function which maps character->character but do not offer a good way to pass additional arguments to that function.

Usage

lookup(x, lookup_table, default = x)

lookuper(lookup_table, default = NULL)

Arguments

x

A string vector whose elements are to be looked up.

lookup_table

The lookup table to use.

default

If a value is not found in the lookup table, the value will be taken from default. This must be a character vector of length 1 or the same length as x. Useful values include x (the default setting), NA, or "" (an empty string).

Value

The lookup() function returns string vector based on x, with values replaced with the lookup values from lookup_table. Any values not found in the lookup table are taken from default.

The lookuper() function returns a function that takes character vectors as its argument x, and returns either the corresponding values from the underlying lookup table, or the original values from x for those elements that are not found in the lookup table (or looks them up from the default).

Examples

fruit_lookup_vector <- c(a="Apple", b="Banana", c="Cherry")
lookup(letters[1:5], fruit_lookup_vector)
lookup(letters[1:5], fruit_lookup_vector, default = NA)

mtcars_lookup_data_frame <- data.frame(
  name = c("mpg", "hp", "wt"),
  value = c("Miles/(US) gallon", "Gross horsepower", "Weight (1000 lbs)"))
lookup(names(mtcars), mtcars_lookup_data_frame)

lookup_fruits <- lookuper(list(a="Apple", b="Banana", c="Cherry"))
lookup_fruits(letters[1:5])


Embed factor levels and value labels in values.

Description

This function adds level/label information as an annotation to either factors or labelled variables. This function is called notate() rather than annotate() to avoid conflict with ggplot2::annotate(). It is a generic that can operate either on individual vectors or on a data.frame.

When printing labelled variables from a tibble in a console, both the numeric value and the text label are shown, but no variable labels. When using the View() function, only variable labels are shown but no value labels. For factors, there is no way to view the integer levels and values at the same time.

In order to allow the viewing of both variable and value labels at the same time, this function converts both factor and labelled variables to character, including both numeric levels (labelled values) and character values (labelled labels) in the output.

Usage

notate(x)

Arguments

x

The object (either vector or date.frame of vectors), that one desires to annotate and/or view.

Value

The processed data.frame, suitable for viewing, in particular through the View() function.

Examples

d <- data.frame(
  chr = letters[1:4],
  fct = factor(c("alpha", "bravo", "chrly", "delta")),
  lbl = ll_labelled(c(1, 2, 3, NA),
                    labels = c(one=1, two=2),
                    label = "A labelled vector")
)
dn <- notate(d)
dn
# View(dn)


Helper function to standardize the lookup_table.

Description

Preprocessing the lookup table to convert it to a list can take some time, so when possible, we want to do it only once. Therefore we offload it to a helper function

Usage

standardize_lookup_table(lookup_table)

Arguments

lookup_table

The unstandardized lookup table (must still be one of the formats specified for the lookup() function).

Value

The lookup table as a list.


Return a threadbare version of a vector

Description

A bare object is an R object that has no class attributes (see rlang::is_bare_character()). A threadbare object is an atomic object (i.e. not a list(), see is.atomic()), with no attributes at all. The function returns an error if a list is passed.

Usage

threadbare(x)

Arguments

x

A vector, possibly classed, but not a list object, to strip of all attributes.

Value

A vector with the same core values as x, but with no attributes() at all, not even names().

See Also

Other labelled light: ll_assert_labelled(), ll_labelled(), ll_to_character(), ll_val_labels(), ll_var_label()


Utility function to output an error

Description

This function is used to capture errors, typically inside a tryCatch() statement and output them in a clean and readable way. The function provides line-wrapping, with a configurable width. When printing the error message, it prefixes the text with "⁠#E> ⁠" to make it easier to look for the error.

Usage

wrap_error(e, wrap = 50)

Arguments

e

The error to wrap.

wrap

How many characters per line before wrapping.

Value

The original error is returned invisibly.

Examples

tryCatch(stop("This is an error"), error=wrap_error)


Sample from a vector in a safe way

Description

The zample() function duplicates the functionality of sample(), with the exception that it does not attempt the (sometimes dangerous) user-friendliness of switching the interpretation of the first element to a number if the length of the vector is 1. zample() always treats its first argument as a vector containing elements that should be sampled, so your code won't break in unexpected ways when the input vector happens to be of length 1.

Usage

zample(x, size = length(x), replace = FALSE, prob = NULL)

Arguments

x

The vector to sample from

size

The number of elements to sample from x (defaults to length(x))

replace

Should elements be replaced after sampling (defaults to false)

prob

A vector of probability weights (defaults to equal probabilities)

Details

If what you really want is to sample from an interval between 1 and n, you can use sample(n) or sample.int(n) (but make sure to only pass vectors of length one to those functions).

Value

The resulting sample

Examples

# For vectors of length 2 or more, zample() and sample() are identical
set.seed(42); zample(7:11)
set.seed(42); sample(7:11)

# For vectors of length 1, zample() will still sample from the vector,
# whereas sample() will "magically" switch to interpreting the input
# as a number n, and sampling from the vector 1:n.
set.seed(42); zample(7)
set.seed(42); sample(7)

# The other arguments work in the same way as for sample()
set.seed(42); zample(7:11, size=13, replace=TRUE, prob=(5:1)^3)
set.seed(42); sample(7:11, size=13, replace=TRUE, prob=(5:1)^3)

# Of course, sampling more than the available elements without
# setting replace=TRUE will result in an error
set.seed(42); tryCatch(zample(7, size=2), error=wrap_error)


Generate sequence in a safe way

Description

The zeq() function creates an increasing integer sequence, but differs from the standard one in that it will not silently generate a decreasing sequence when the second argument is smaller than the first. If the second argument is one smaller than the first it will generate an empty sequence, if the difference is greater, the function will throw an error.

Usage

zeq(from, to)

Arguments

from

The lower bound of the sequence

to

The higher bound of the sequence

Value

A sequence ranging from from to to

Examples

# For increasing sequences, zeq() and seq() are identical
zeq(11,15)
zeq(11,11)

# If second argument equals first-1, an empty sequence is returned
zeq(11,10)

# If second argument is less than first-1, the function throws an error
tryCatch(zeq(11,9), error=wrap_error)


Return the single (unique) value found in a vector

Description

The zingle() function returns the first element in a vector, but only if all the other elements are identical to the first one (the vector only has a zingle value). If the elements are not all identical, it throws an error. The vector must contain at least one non-NA value, or the function errors out as well. This is especially useful in aggregations, when all values in a given group should be identical, but you want to make sure.

Usage

zingle(x, na.rm = FALSE)

Arguments

x

Vector of elements that should all be identical

na.rm

Should NA elements be removed prior to comparison

Details

Optionally takes a na.rm parameter, similarly to sum, mean and other aggregate functions. If TRUE, NA values will be removed prior to comparing the elements, so the function will accept input values that contain a combination of the single value and any NA values (but at least one non-NA value is required).

Only values are tested for equality. Any names are simply ignored, and the result is an unnamed value. This is in line with how other aggregation functions handle names.

Value

The zingle element in the vector

Examples

# If all elements are identical, all is good.
# The value of the element is returned.
zingle(c("Alpha", "Alpha", "Alpha"))

# If any elements differ, an error is thrown
tryCatch(zingle(c("Alpha", "Beta", "Alpha")), error=wrap_error)

if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) {
  d <- tibble::tribble(
    ~id, ~name, ~fouls,
    1, "James", 3,
    2, "Jack",  2,
    1, "James", 4
  )

  # If the data is of the correct format, all is good
  d %>%
    dplyr::group_by(id) %>%
    dplyr::summarise(name=zingle(name), total_fouls=sum(fouls))
 }

if (require("dplyr", quietly=TRUE, warn.conflicts=FALSE)) {
  # If a name does not match its ID, we should get an error
  d[1,"name"] <- "Jammes"
  tryCatch({
    d %>%
      dplyr::group_by(id) %>%
      dplyr::summarise(name=zingle(name), total_fouls=sum(fouls))
  }, error=wrap_error)
}