predictsr

predictsr fetches the latest version of the open-access PREDICTS database extract from the Natural History Museum Data Portal.

The publicly-available PREDICTS (Projecting Responses of Ecological Diversity In Changing Terrestrial Systems) dataset comprises 4,318,808 measurements, from 35,736 sampling locations in 101 countries and 54,863 species, taken across two releases: one in 2016 and another in 2022. The database is under constant development and is continuously growing, with later releases planned.

The data were collated from spatial comparisons of local-scale biodiversity exposed to different intensities and types of anthropogenic pressures, from terrestrial sites around the world, and are described in Hudson et. al. (2016) doi:10.1002/ece3.2579.

This package accesses the latest version of the open-access database as a dataframe, and is under active development.

Installation

Installation from CRAN proceeds as usual:

install.packages("predictsr")

You can also install the development version of predictsr from GitHub with:

# install.packages("pak")
pak::pak("Biodiversity-Futures-Lab/predictsr")

Then as usual just load the package:

library(predictsr)

Usage

Most users will want to use the LoadPredictsData function to pull in the PREDICTS database extract. This downloads the database automatically, and saves it for you to a location you specify. It also writes some metadata to disk so you don’t re-download it unnecessarily.

When you do so, you will need to provide a file path where predictsr will save the data to:

df_predicts <- LoadPredictsData("/home/connor/predicts.rds")

After running this the first time, it will download the PREDICTS extract (2016 and 2022 by default), and write an associated metadata file. Successive calls will then just load the dataframe, without downloading anything new, by checking the metadata (this uses a SHA-based invalidation).

Extra functions

As well as the LoadPredictsData function, there are many other functions included.

Under the hood, LoadPredictsData uses the GetPredictsData function to pull in the database extract:

predicts <- GetPredictsData()

which by default will read in the 2016 and 2022 data, as a dataframe. This may be handy if you want to use your own method of caching (e.g. targets).

To read in the site-level summaries of the data:

summaries <- GetSitelevelSummaries()

which will also, by default, read in the 2016 and 2022 data, as a dataframe.

To read in the descriptions of the columns of the database extract:

columns <- GetColumnDescriptions()

which will give you a dataframe on the information on the columns that are in the database extract.

Notes

The Natural History Museum cannot warrant the quality or accuracy of the data.

This package builds upon the NHM data portal API, which by default has no API rate limits. Please respect this and be responsible with your access.

Copyright

This data is provided subject to terms on the Natural History Museum Data Portal. The 2016 release is licensed under a CC BY-NC 4.0 license, and the 2022 release is licensed under a CC NC license.

Note that this stipulates that you may not use the data for commercial purposes, and that you must provide attribution to the original source of the data.