Version: | 0.17.0 |
Depends: | R (≥ 2.14.0) |
Imports: | utils, R.methodsS3 (≥ 1.8.1), R.oo (≥ 1.24.0), R.utils (≥ 2.10.1), digest (≥ 0.6.13) |
Title: | Fast and Light-Weight Caching (Memoization) of Objects and Results to Speed Up Computations |
Author: | Henrik Bengtsson [aut, cre, cph] |
Maintainer: | Henrik Bengtsson <henrikb@braju.com> |
Description: | Memoization can be used to speed up repetitive and computational expensive function calls. The first time a function that implements memoization is called the results are stored in a cache memory. The next time the function is called with the same set of parameters, the results are momentarily retrieved from the cache avoiding repeating the calculations. With this package, any R object can be cached in a key-value storage where the key can be an arbitrary set of R objects. The cache memory is persistent (on the file system). |
License: | LGPL-2.1 | LGPL-3 [expanded from: LGPL (≥ 2.1)] |
LazyLoad: | TRUE |
URL: | https://github.com/HenrikBengtsson/R.cache |
BugReports: | https://github.com/HenrikBengtsson/R.cache/issues |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2025-05-02 21:22:23 UTC; henrik |
Repository: | CRAN |
Date/Publication: | 2025-05-02 22:20:02 UTC |
Package R.cache
Description
Memoization can be used to speed up repetitive and computational expensive function calls. The first time a function that implements memoization is called the results are stored in a cache memory. The next time the function is called with the same set of parameters, the results are momentarily retrieved from the cache avoiding repeating the calculations. With this package, any R object can be cached in a key-value storage where the key can be an arbitrary set of R objects. The cache memory is persistent (on the file system).
Installation and updates
To install this package and all of its dependent packages, do:
install.packages("R.cache")
To get started
- loadCache, saveCache
-
Methods for loading and saving objects from and to the cache.
- getCacheRootPath, setCacheRootPath
-
Methods for getting and setting the directory where cache files are stored.
How to cite this package
Whenever using this package, please cite [1] as
Bengtsson, H. The R.oo package - Object-Oriented Programming with References Using Standard R Code, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), ISSN 1609-395X, Hornik, K.; Leisch, F. & Zeileis, A. (ed.), 2003
Wishlist
Here is a list of features that would be useful, but which I have too little time to add myself. Contributions are appreciated.
Add a functionality to identify cache files that are no longer of use. For now, there is an extra header field for arbitrary comments which can be used, but maybe more formal fields are useful, e.g. keywords, user, etc?
If you consider implement some of the above, make sure it is not already implemented by downloading the latest "devel" version!
Related work
See also the filehash package, and the cache()
function
in the Biobase package of Bioconductor.
License
The releases of this package is licensed under LGPL version 2.1 or newer.
References
[1] H. Bengtsson, The R.oo package - Object-Oriented Programming with References Using Standard R Code, In Kurt Hornik, Friedrich Leisch and Achim Zeileis, editors, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20-22, Vienna, Austria. https://www.r-project.org/conferences/DSC-2003/Proceedings/
Author(s)
Henrik Bengtsson
Loads an object from a file connection
Description
Loads an object from a file connection similar to load
(), but without resetting
file connections (to position zero).
WARNING: This is an internal function that should not be called by anything but the internal code of the R.cache package.
Usage
.baseLoad(con, envir=parent.frame())
Arguments
con |
A |
envir |
An |
Details
The reason why it is not possible to use load
() is that
that resets the file position of the connection before trying to
load the object.
The reason why that happens is because when you pass a regular file
connection to load
() it gets coerced via gzcon
(),
which is the function that resets the file position.
The workaround is to create a local copy of base::load()
and
modify it by dropping the gzcon()
coercion. This is possible
because this function, that is .baseLoad()
, is always called
with a gzfile()
connection
.
Value
Returns (invisible) a character
vector
of the names of objects
loaded.
See Also
This function is used by loadCache
() and readCacheHeader
().
Non-documented objects
Description
This page contains aliases for all "non-documented" objects that
R CMD check
detects in this package.
Almost all of them are generic functions that have specific
document for the corresponding method coupled to a specific class.
Other functions are re-defined by setMethodS3()
to
default methods. Neither of these two classes are non-documented
in reality.
The rest are deprecated methods.
Author(s)
Henrik Bengtsson
Options used by R.cache
Description
Below are all R options specific to the R.cache package.
WARNING: Note that the names and the default values of
these options may change in future versions of the package.
Please use with care until further notice.
Options for controlling futures
- R.cache.compress:
-
If
TRUE
,saveCache
() will write compressed cache files, otherwise not. (Default:FALSE
) - R.cache.enabled:
-
If
TRUE
,loadCache
() is reading from andsaveCache
() is writing to the cache, otherwise not. (Default:TRUE
) - R.cache.rootPath:
-
A
character
string specifying the default cache root path. If not set, environment variable R_CACHE_ROOTPATH is considered. - R.cache.touchOnLoad:
-
If
TRUE
,loadCache
() will update the "last-modified" timestamp of the cache file (to the current time), otherwise not. (Default:FALSE
)
Creates a copy of an existing function such that its results are memoized
Description
Creates a copy of an existing function such that its results are memoized.
Usage
## Default S3 method:
addMemoization(fcn, envir=parent.frame(), ...)
Arguments
fcn |
A |
envir |
The |
... |
Additional arguments for controlling the memoization,
i.e. all arguments of |
Details
The new function is setup such that the the memoized call is done in the environment of the caller (the parent frame of the function).
If the function
returns NULL
, that particular function call is
not memoized.
Value
Returns a function
.
Author(s)
Henrik Bengtsson
See Also
The returned function utilized memoizedCall
() internally.
Removes all files in a cache file directory
Description
Removes all files in a cache file directory.
Usage
## Default S3 method:
clearCache(path=getCachePath(...), ..., recursive=FALSE, prompt=TRUE && interactive())
Arguments
path |
A |
... |
Arguments passed to |
recursive |
If |
prompt |
If |
Details
If the specified directory does not exists, an exception is thrown.
Value
Returns (invisibly) a character
vector
of pathnames of the
files removed. If no files were removed, NULL
is returned.
Author(s)
Henrik Bengtsson
Evaluates an R expression with memoization
Description
Evaluates an R expression with memoization such that the same objects are assigned to the current environment and the same result is returned, if any.
Usage
evalWithMemoization(expr, key=NULL, ..., envir=parent.frame(),
drop=c("srcref", "srcfile", "wholeSrcref"), force=FALSE)
Arguments
expr |
The |
key |
Additional objects to uniquely identify the evaluation. |
... |
|
envir |
The |
drop |
|
force |
If |
Value
Returns the value of the evaluated expr
expression
, if any.
Author(s)
Henrik Bengtsson
See Also
Internally, eval
() is used to evaluate the expression.
Examples
for (kk in 1:5) {
cat(sprintf("Iteration #%d:\n", kk))
res <- evalWithMemoization({
cat("Evaluating expression...")
a <- 1
b <- 2
c <- 4
Sys.sleep(1)
cat("done\n")
b
})
print(res)
# Sanity checks
stopifnot(a == 1 && b == 2 && c == 4)
# Clean up
rm(a, b, c)
} # for (kk ...)
## OUTPUTS:
## Iteration #1:
## Evaluating expression...done
## [1] 2
## Iteration #2:
## [1] 2
## Iteration #3:
## [1] 2
## Iteration #4:
## [1] 2
## Iteration #5:
## [1] 2
############################################################
# WARNING
############################################################
# If the expression being evaluated depends on
# "input" objects, then these must be be specified
# explicitly as "key" objects.
for (ii in 1:2) {
for (kk in 1:3) {
cat(sprintf("Iteration #%d:\n", kk))
res <- evalWithMemoization({
cat("Evaluating expression...")
a <- kk
Sys.sleep(1)
cat("done\n")
a
}, key=list(kk=kk))
print(res)
# Sanity checks
stopifnot(a == kk)
# Clean up
rm(a)
} # for (kk ...)
} # for (ii ...)
## OUTPUTS:
## Iteration #1:
## Evaluating expression...done
## [1] 1
## Iteration #2:
## Evaluating expression...done
## [1] 2
## Iteration #3:
## Evaluating expression...done
## [1] 3
## Iteration #1:
## [1] 1
## Iteration #2:
## [1] 2
## Iteration #3:
## [1] 3
Locates a cache file
Description
Locates a cache file from a key object.
Usage
## Default S3 method:
findCache(key=NULL, ...)
Arguments
key |
An optional object from which a hexadecimal hash code will be generated and appended to the filename. |
... |
Additional argument passed to |
Value
Returns the pathname as a character
, or NULL
if the no cached
data exists.
Author(s)
Henrik Bengtsson
See Also
generateCache
().
loadCache
().
Generates a cache pathname from a key object
Description
Generates a cache pathname from a key object.
Usage
## Default S3 method:
generateCache(key, suffix=".Rcache", ...)
Arguments
key |
A |
suffix |
A |
... |
Arguments passed to |
Value
Returns the pathname as a character
string.
Author(s)
Henrik Bengtsson
See Also
findCache
().
Internally, the generic function getChecksum
() is used to
calculate the checksum of argument key
.
Gets the path to the file cache directory
Description
Gets the path to the file cache directory. If missing, the directory is created.
Usage
## Default S3 method:
getCachePath(dirs=NULL, path=NULL, rootPath=getCacheRootPath(), ...)
Arguments
dirs |
A |
path , rootPath |
(Advanced) |
... |
Not used. |
Value
Returns the path as a character
string.
If the user does not have write permissions to the path, then
an error is thrown.
Author(s)
Henrik Bengtsson
See Also
Gets the root path to the file cache directory
Description
Gets the root path to the file cache directory.
Usage
## Default S3 method:
getCacheRootPath(defaultPath=NULL, ...)
Arguments
defaultPath |
The default path, if no user-specified directory has been given. |
... |
Not used. |
Value
Returns the path as a character
string.
Author(s)
Henrik Bengtsson
See Also
Too set the directory where cache files are stored,
see setCacheRootPath
().
Examples
print(getCacheRootPath())
Generates a deterministic checksum for an R object
Description
Generates a deterministic checksum for an R object such that (i) if the same object is used again, then the same checksum is obtained, and (ii) if another object is used, then a different checksum is obtained with extremely high probability. In other words, it is highly unlikely that two different objects have the same checksum.
Usage
## Default S3 method:
getChecksum(object, ...)
Arguments
object |
The object for which a checksum should be calculated. |
... |
Additional arguments passed to |
Details
Because getChecksum()
is a generic function,
it is possible to provide custom methods for specific
classes of objects. This means that, if a certain class
specifies fields that carry auxiliary data, then these
can be excluded from the checksum calculation.
For instance, assume that all objects of class 'TimestampedObject'
contain timestamps specifying when each object was created.
Then a custom getChecksum()
method for this class can
first drop the timestamp and then call the default
getChecksum()
function.
Value
Returns checksum represented as a character
string.
Author(s)
Henrik Bengtsson
See Also
Internally, the digest
method is used to calculate
the checksum.
Loads data from file cache
Description
Loads data from file cache, which is unique for an optional key object.
Usage
## Default S3 method:
loadCache(key=NULL, sources=NULL, suffix=".Rcache", removeOldCache=TRUE, pathname=NULL,
dirs=NULL, ..., onError=c("warning", "error", "message", "quiet", "print"))
Arguments
key |
An optional object from which a hexadecimal hash code will be generated and appended to the filename. |
sources |
Optional source objects. If the cache object has a timestamp older than one of the source objects, it will be ignored and removed. |
suffix |
A |
removeOldCache |
If |
pathname |
The pathname to the cache file. If specified,
arguments |
dirs |
A |
... |
Not used. |
onError |
A |
Details
The hash code calculated from the key
object is a
32 characters long hexadecimal MD5 hash code.
For more details, see getChecksum
().
Value
Returns an R object or NULL
, if cache does not exist.
Author(s)
Henrik Bengtsson
See Also
saveCache
().
Examples
simulate <- function(mean, sd) {
# 1. Try to load cached data, if already generated
key <- list(mean, sd)
data <- loadCache(key)
if (!is.null(data)) {
cat("Loaded cached data\n")
return(data);
}
# 2. If not available, generate it.
cat("Generating data from scratch...")
data <- rnorm(1000, mean=mean, sd=sd)
Sys.sleep(1) # Emulate slow algorithm
cat("ok\n")
saveCache(data, key=key, comment="simulate()")
data;
}
data <- simulate(2.3, 3.0)
data <- simulate(2.3, 3.5)
data <- simulate(2.3, 3.0) # Will load cached data
# Clean up
file.remove(findCache(key=list(2.3,3.0)))
file.remove(findCache(key=list(2.3,3.5)))
Calls a function with memoization
Description
Calls a function with memoization, that is, caches the results to be retrieved if the function is called again with the exact same arguments.
Usage
## Default S3 method:
memoizedCall(what, ..., envir=parent.frame(), force=FALSE, sources=NULL, dirs=NULL)
Arguments
what |
The |
... |
Arguments passed to the function. |
envir |
The |
force |
If |
sources , dirs |
Details
If the function
returns NULL
, that particular function call is
not memoized.
Value
Returns the result of the function call.
Author(s)
Henrik Bengtsson
See Also
Internally, loadCache
() is used to load memoized results,
if available. If not available, then do.call
() is used to
evaluate the function call,
and saveCache
() is used to save the results to cache.
Loads data from file cache
Description
Loads data from file cache, which is unique for an optional key object.
Usage
## Default S3 method:
readCacheHeader(file, ...)
Arguments
file |
A filename or a |
... |
Not used. |
Value
Returns a named list
structure with element identifier
,
version
, comment
(optional), sources
(optional),
and timestamp
.
Author(s)
Henrik Bengtsson
See Also
findCache
().
loadCache
().
saveCache
().
Examples
data <- 1:120
key <- list(some=1, vari=2, ables=3)
saveCache(key=key, data, comment="A simple example of a cached object.")
header <- readCacheHeader(findCache(key=key))
print(header)
# Clean up
file.remove(findCache(key=key))
Saves data to file cache
Description
Saves data to file cache, which is unique for an optional key object.
Usage
## Default S3 method:
saveCache(object, key=NULL, sources=NULL, suffix=".Rcache", comment=NULL, pathname=NULL,
dirs=NULL, compress=NULL, ...)
Arguments
object |
The object to be saved to file. |
key |
An optional object from which a hexadecimal hash code will be generated and appended to the filename. |
sources |
Source objects used for comparison of timestamps when cache is loaded later. |
suffix |
A |
comment |
An optional |
pathname |
(Advanced) An optional |
dirs |
A |
compress |
If |
... |
Additional argument passed to |
Value
Returns (invisible) the pathname of the cache file.
Compression
The saveCache()
method saves a compressed cache file
(with filename extension *.gz) if argument compress
is TRUE
.
The loadCache
() method locates (via findCache
()) and
loads such cache files as well.
Author(s)
Henrik Bengtsson
See Also
For more details on how the hash code is generated etc, loadCache
().
Examples
## Not run: For an example, see ?loadCache
Sets the path to the file cache directory
Description
Sets the path to the file cache directory.
Usage
## Default S3 method:
setCachePath(dirs=NULL, path=NULL, ...)
Arguments
dirs |
A |
path |
The path to override the path according to the
|
... |
Not used. |
Value
Returns nothing.
Author(s)
Henrik Bengtsson
See Also
getCachePath
().
Sets the root path to the file cache directory
Description
Sets the root path to the file cache directory.
Usage
## Default S3 method:
setCacheRootPath(path=NULL, ...)
Arguments
path |
The path. |
... |
Not used. |
Value
Returns (invisibly) the old root path.
Author(s)
Henrik Bengtsson
See Also
Interactively offers the user to set up the default root path
Description
Interactively offers the user to set up the default root path.
Usage
## Default S3 method:
setupCacheRootPath(defaultPath=NULL, ...)
Arguments
defaultPath |
Default root path to set. |
... |
Not used. |
Details
If the cache root path is already set, it is used and nothing is done.
If the "default" root path (defaultPath
) exists, it is used,
otherwise, if running interactively, the user is asked to approve
the usage (and creation) of the default root path.
In all other cases, the cache root path is set to a session-specific
temporary directory.
Value
Returns (invisibly) the root path,
or NULL
if running a non-interactive session.
Author(s)
Henrik Bengtsson
See Also
Internally, setCacheRootPath
() is used to set the cache root path.
The interactive
() function is used to test whether R is
running interactively or not.