Help for package easyPubMed

Type:

Package

Title:

Search and Retrieve Scientific Publication Records from PubMed

Version:

3.1.6

Date:

2025-08-25

Maintainer:

Damiano Fantini <damiano.fantini@gmail.com>

Description:

Query NCBI Entrez and retrieve PubMed records in XML or text format. Process PubMed records by extracting and aggregating data from selected fields. A large number of records can be easily downloaded via this simple-to-use interface to the NCBI PubMed API.

URL:

https://www.data-pulse.com/dev_site/easypubmed/

Depends:

R(≥ 3.5)

Imports:

methods, utils, rlang

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

LazyData:

true

Encoding:

UTF-8

License:

GPL-3

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-08-25 14:58:55 UTC; dami

Author:

Damiano Fantini [aut, cre]

Repository:

CRAN

Date/Publication:

2025-08-25 18:40:02 UTC

Retrieve and Process Scientific Publication Records from Pubmed

Description

Query NCBI Entrez and retrieve PubMed records in XML or TXT format. PubMed records can be downloaded and saved as XML or text files. Data integrity is enforced during data download, allowing to retrieve and save very large number of records effortlessly. PubMed records can be processed to extract publication- and author-specific information.

Details

This software is based on the information included in the Entrez Programming Utilities Help manual authored by Eric Sayers, PhD and available on the NCBI Bookshelf (NBK25500). This R library is NOT endorsed, supported, maintained NOR affiliated with NCBI.

Author(s)

Damiano Fantini damiano.fantini@gmail.com

References

Tutorials and Help Webpage: https://www.data-pulse.com/dev_site/easypubmed/
NCBI PubMed Help Manual: https://pubmed.ncbi.nlm.nih.gov/help/
Entrez Programming Utilities Help (NBK25500): https://www.ncbi.nlm.nih.gov/books/NBK25500/

Examples

## Example 01: retrieve data in XML format, extract info, show
# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  my_query_string <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  epm <- epm_query(my_query_string)
  epm <- epm_fetch(epm)
  epm <- epm_parse(epm, max_authors = 5, max_references = 10)
  processed_data <- get_epm_data(epm)
  utils::head(processed_data)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

## Not run: 
## Example 02: retrieve data in medline format
my_query_string <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
epm <- epm_query(my_query_string)
epm <- epm_fetch(epm, format = 'medline')
medline_data <- get_epm_raw(epm)
first_record <- medline_data[[1]] 
cat(first_record, sep = '\n')


## Additional Examples: show easyPubMed Vignette
library(easyPubMed)
vignette("easyPubMed_demo")


## End(Not run)

Parse and Format Author Names and Affiliations.

Description

Extract Author Information form a slice of a raw XML PubMed record. Last Name, First Name, Address and emails are returned. Only the first address of each author is returned. A collapsed version of the author list is also returned.

Usage

EPM_auth_parse(x, max_authors = 15, autofill = TRUE)

Arguments

x

String (character vector of length 1) including an XML Author List section from a PubMed record.

max_authors

Numeric, maximum number of authors to include. See details for additional information.

autofill

Logical, shall non-missing address information be propagated to fill missing address information for other authors in the same publication.

Details

The value of the 'max_authors' argument should be tuned to control which author information to extract from the input. If 'max_authors' is set to '0', no author information are extracted. If 'max_authors' is set to '-1' (or any negative number), only information corresponding to the last author are extracted. If 'max_authors' is set to '+1', only the first author information are extracted. If 'max_authors' is set to any other positive integer, only information for the indicated number of authors is extracted. In this case, information for both the first and the last author will be included.

Value

list including 2 elements: 'authors' is a data.frame including one row for each author and n=4 columns: lastname, forename, address and email; 'collapsed' is a list including 2 elements (each element is a string): authors and address.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

aff <- paste0('<Author><LastName>Doe</LastName><ForeName>John</ForeName>', 
              '<Affiliation>Univ A</Affiliation></Author>',
              '<Author><LastName>Doe</LastName><ForeName>Jane</ForeName>', 
              '<Affiliation>jane_doe@univ_a.edu</Affiliation></Author>',
              '<Author><LastName>Foo</LastName><ForeName>Bar</ForeName>', 
              '<Affiliation>Univ B</Affiliation></Author>')
easyPubMed:::EPM_auth_parse(aff)

Check Metadata from Imported XML Files.

Description

Analyze the Metadata from different XML files that were imported using easyPubMed and identify which records / files can be merged together and which ones to exclude. Only files with the same unique ID can be merged together a this step. The goal is to re-build a consistent easyPubMed object.

Usage

EPM_check_guide(x)

Arguments

x

Data.frame including information from the imported XML files. The following columnnames are expected: 'index', 'file', 'JobUniqueId', 'JobQuery', 'JobBatch'.

Value

Data.frame identical to 'x' with an additional *numeric) column ('pass' column).

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

gx <- data.frame(
  index = c(1, 2, 3, 4, 5),
  JobUniqueId = rep('xyz0x', 5),
  JobQuery = rep('test_query', 1),
  JobBatch = c(1, 2, 3, 4, 3),
  JobBatchNum = rep(4, 5),
  stringsAsFactors = FALSE)
easyPubMed:::EPM_check_guide(gx)

Custom XML Tag Matching.

Description

Extract text form a string containing XML or HTML tags. Text included between tags of interest will be returned. If multiple tagged substrings are found, they will be returned as different elements of a list or character vector.

Usage

EPM_custom_grep(xml_data, tag, xclass = NULL, format = "list")

Arguments

xml_data

String (character vector of length 1), this is a string including PubMed records or string including XML/HTML tags.

tag

String (character vector of length 1), the tag of interest (e.g., "Title") (should NOT include < > chars).

xclass

String (character vector of length 1), a tag decorator of interest (e.g., "EIdType=\"doi\""). Can be NULL.

format

String. Must be a value in c("list", "char"). Indicates the type of output. Defaults to "list".

Value

List or vector where each element corresponds to an in-tag substring.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

x <- "This string includes <Ti>an XML Tag</Ti>."
easyPubMed:::EPM_custom_grep(x, tag = "Ti")

Parse and Format a Pubmed Date Field.

Description

Extract Date Information form a slice of a raw XML PubMed record. Day, month and year are returned. Months are recoded as numeric if needed (e.g., 'Oct' and 'October' are converted to 10). If month and/or day information are missing, these are imputed to 1. If the year is missing, NA is returned.

Usage

EPM_date_parse(x)

Arguments

x

String (character vector of length 1) including an XML date field from a PubMed record.

Value

list including n=3 numeric elements: day, month and year.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

dt0 <- '<Year>2021</Year><Month>03</Month><Day>12</Day>'
easyPubMed:::EPM_date_parse(dt0)

Decode an XML String into the Corresponding Metadata.

Description

Decode an XML String including a list of meta information associated to an easyPubMed object whose contents were written to a text file on a local disk. These meta-information are used to keep track of easyPubMed query jobs and/or to re-build objects starting from XML files saved on a local disk.

Usage

EPM_decode_xml_meta(x)

Arguments

x

String corresponding to the XML-decorated text including metadata from an easyPubMed object/query job.

Value

String, chunck of XML-decorated text including meta information.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

xml <- paste0('<EPMxJobData><EPMxJobUniqueId>EPMJ_20231017151112_mi7xvol743', 
              'rvz5ry5z3n8qm0ww</EPMxJobUniqueId><EPMxJobBatchNum>4</EPMxJo', 
              'bBatchNum><EPMxJobBatch>1</EPMxJobBatch><EPMxQuery>Test_Quer', 
              'y</EPMxQuery><EPMxQBatchInitDate>1937/01/22</EPMxQBatchInitD', 
              'ate><EPMxQBatchEndDate>1980/08/01</EPMxQBatchEndDate><EPMxQB', 
              'atchDiffDays>15897</EPMxQBatchDiffDays><EPMxQBatchExpCount>2', 
              '13</EPMxQBatchExpCount><EPMxMaxRecordsPerBatch>1000</EPMxMax', 
              'RecordsPerBatch><EPMxExpCount>2083</EPMxExpCount><EPMxExpNum', 
              'OfBatches>4</EPMxExpNumOfBatches><EPMxAllRecordsCovered>TRUE', 
              '</EPMxAllRecordsCovered><EPMxExpMissedRecords>0</EPMxExpMiss', 
              'edRecords><EPMxQueryDate>2023-10-17 15:11:12</EPMxQueryDate>', 
              '<EPMxRawFormat>xml</EPMxRawFormat><EPMxRawEncoding>UTF-8</EP', 
              'MxRawEncoding><EPMxRawDate>2023-10-17 15:14:12</EPMxRawDate>', 
              '<EPMxLibVersion>3.01</EPMxLibVersion></EPMxJobData>')
easyPubMed:::EPM_decode_xml_meta(xml)

Detect PubMed Record Identifiers.

Description

Parse a list of pubmed records in XML or Medline format, extract and return the corresponding PubMed record identifiers (PMID).

Usage

EPM_detect_pmid(x, format = "xml", as.list = TRUE)

Arguments

x

list including PubMed record data (either in 'xml' or 'abstract' format).

format

string (character of length 1) indicating the format of each element in x (either 'xml' or 'medline').

as.list

logical (of length 1). Shall results be returned as a list.

Value

list of PubMed record identifiers.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

x <- list(A='First record: <PMID>Rec_1A</PMID> Lorem ipsum dolor sit amet', 
          B='Another record: <Ti>Title</Ti><PMID>Rec_2</PMID> Lorem ipsum ')
easyPubMed:::EPM_detect_pmid(x, format = 'xml')

Submit a Query to the NCBI EFetch Server.

Description

Submit a Query to the NCBI EFetch Server and capture the response.

Usage

EPM_efetch_basic_q(params)

Arguments

params

List including the information for querying the NCBI EFetch Server.

Details

The input list must include the elements listed below.

'web_env'. String, unique value returned by the NCBI ESearch server.
'format'. String corresponding to the desired response data format (e.g., "xml").
'query_key'. Integer, key value returned by the NCBI ESearch server.
'retstart'. Integer, numeric index of the first record to be request.
'retmax'. Integer, maximum number of records to be retrieved from the server.
'encoding'. String, encoding of the data (e.g., "UTF-8").

Value

Character vector including the response from the server.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- easyPubMed:::EPM_esearch_basic_q(params = list(q = "easyPubMed"))
  x <- easyPubMed:::EPM_esearch_parse(x)
  my_params <- list(web_env = x$web_env, 
                    query_key = x$query_key, 
                    format = "uilist")
  easyPubMed:::EPM_efetch_basic_q(params = my_params)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Encode Metadata to an XML String.

Description

Encode a list of meta information from an easyPubMed object into an XML string. These meta-information are used to keep track of easyPubMed query jobs and/or to re-build objects starting from XML files saved on a local disk.

Usage

EPM_encode_meta_to_xml(meta, job_list, i, encoding)

Arguments

meta

List including metadata associated with an easyPubMed query job. It corresponds to the contents of the 'meta' slot of an easyPubMed object.

job_list

Data.frame that defines the list of sub-queries of an easyPubMed query job. It corresponds to the 'job_list' data.frame included in the 'misc' slot of an easyPubMed object.

i

Integer, index of the batch (query sub-job) being written to file.

encoding

String, this is the Encoding of the contents/text being retrieved from the Entrez server (typically, 'UTF-8').

Value

String, chunck of XML-decorated text including meta information.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

tmp_meta <- list(max_records_per_batch = 1000, 
                 exp_count = 10, 
                 exp_num_of_batches = 1, 
                 all_records_covered = TRUE, 
                 exp_missed_records = 0, 
                 query_date = "2023-10-16 23:13:29", 
                 UID = 'EPMJ_20231017141741_c4das',
                 EPM_version = "3.01")
tmp_jobs <- data.frame(query_string = 'my test query', 
                       init_date = '1990/01/01', 
                       end_date = '2023/01/01', 
                       diff_days = 12053,
                       exp_count = 10, 
                       stringsAsFactors = FALSE)
easyPubMed:::EPM_encode_meta_to_xml(meta = tmp_meta, job_list = tmp_jobs, 
                                    i = 1, encoding = 'UTF-8' )

Submit a Query to the NCBI ESearch Server.

Description

Submit a Query to the NCBI ESearch Server and capture the response.

Usage

EPM_esearch_basic_q(params)

Arguments

params

List including the information for querying the NCBI ESearch Server.

Details

The params list must include the elements listed below.

'q'. String corresponding to the Query to be submitted to the server.
'api_key'. (Optional) String corresponding to the NCBI API key.

Value

Character vector including the response from the server.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  my_q <- 'easyPubMed'
  my_params <- list(q = my_q)
  easyPubMed:::EPM_esearch_basic_q(params = my_params)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Retrieve Results via an Esearch and Efetch sequence.

Description

Submit a Query to the NCBI ESearch Server, capture the response and retrieve the corresponding PubMed records from the NCBI EFetch Server. Up to the first n=10,000 records returned by the query will be retrieved (as per the NCBI policy). This does not include a timeout limit to complete the operation.

Usage

EPM_esearch_efetch_seq(
  query_string,
  api_key = NULL,
  batch_size = 500,
  encoding = "UTF-8",
  format = "xml",
  max_restart_attempts = 10
)

Arguments

query_string

String (character vector of length 1), corresponding to the query URL to the remote server.

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be NULL.

batch_size

Integer, max number of records to be retrieved as a batch. This corresponds to the "retmax" NCBI parameter.

encoding

String (character vector of length 1), encoding of the resulting records (e.g., "UTF-8").

format

String (character vector of length 1), desired format of the Pubmed records. This must be one of the values in c("xml", "medline", "uilist").

max_restart_attempts

Integer, max number of attempts in case of a failed iteration.

Value

Character vector including the response from the server.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  easyPubMed:::EPM_esearch_efetch_seq(query_string = qry, format = "uilist")
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Parse Responses from the NCBI ESearch Server.

Description

Parse Responses from the NCBI ESearch Server and return a list of information that can be used for retrieving PubMed records from the NCBI EFetch Server.

Usage

EPM_esearch_parse(x)

Arguments

x

String (character vector of length 1), this is the xml string returned by the NCBI ESearch Server.

Details

The output list includes the following items.

'web_env'. String, unique identifier for fetching PubMed records corresponding to the current query.
'query_key'. Integer, unique numeric key for fetching PubMed records corresponding to the current query.
'count'. Integer, expected number of records returned by the current query.
'query_translation'. String, translation of the Query string provided by the user.

Value

List including information extracted from the NCBI ESearch Server response.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  my_q <- 'easyPubMed'
  my_params <- list(q = my_q)
  x <- easyPubMed:::EPM_esearch_basic_q(params = my_params)
  easyPubMed:::EPM_esearch_parse(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Generate a Unique Query Key.

Description

Generate a pseudo-random key that uniquely identifies easyPubMed objects. The key is a 46-char string that includes the current date + time and a list of randomly selected characters, numbers and special characters. The unique key is typically saved in the 'meta' slot of an easyPubMed object, and is also written to local files when records are donwloaded and saved in XML format. This function takes NO arguments.

Usage

EPM_init_unique_key()

Value

string, a 46-char unique key.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

easyPubMed:::EPM_init_unique_key()

Split A PubMed Retrieval Job into Manageable Batches.

Description

Assess the number of PubMed records expected from a user-provided query and split the job in multiple sub-queries if the number is bigger than "max_records_per_batch" (typically, n=10,000). Sub-queries are split according to the "Create Date" of PubMed records. This does not support splitting jobs returning more than "max_records_per_batch" (typically, n=10,000) records that have the same "Create Date" (i.e., "[CRDT]").

Usage

EPM_job_split(
  query_string,
  api_key = NULL,
  max_records_per_batch = 9999,
  verbose = FALSE
)

Arguments

query_string

String (character vector of length 1), corresponding to the query string.

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be NULL.

max_records_per_batch

Integer, maximum number of records that should be expected be sub-query. This number should be in the range 1,000 to 10,000 (typicall, max_records_per_batch=10,000).

verbose

logical, shall progress information be printed to console.

Value

Character vector including the response from the server.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  easyPubMed:::EPM_job_split(query_string = qry, verbose = TRUE)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Parse and Format Pubmed MeSH terms.

Description

Extract MeSH Information form a slice of a raw XML PubMed record. Both MeSH codes and MeSH terms are returned.

Usage

EPM_mesh_parse(x)

Arguments

x

String (character vector of length 1) including an XML Mesh term field/section from a PubMed record.

Value

list including n=2 elements (character vectors): mesh_codes and mesh_terms.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

msh <- paste0('<MeshHeading><DescriptorName UI=\"D000465\" >',
              'Algorithms</DescriptorName></MeshHeading>')
easyPubMed:::EPM_mesh_parse(msh)

Map Job Batches to Filenames.

Description

Build Filenames Matching job sub-tasks. Each filename corresponds to a series of records returned by a specific job batch. The associated filename indicates where the corresponding records will be written on the local disc (if requested by the user).

Usage

EPM_prep_outfile(job_list, path, prefix)

Arguments

job_list

data.frame. This is the 'job_list' data.frame included in the 'misc' slot of an 'easyPubMed' object.

path

folder on the local computer where files will be saved. It must be an existing directory.

prefix

string used as common prefix for all files written as part of the same PubMed record download job.

Value

character vector pointing to the target files where Pubmed records will be written.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

test_df <- data.frame(query_string = c('ANY', 'ANY'), 
                      init_date = c('2020/01/01', '2020/01/10'), 
                      end_date = c('2020/01/11', '2020/01/20'), 
                      diff_days = c(10, 10), 
                      exp_count = 100, 100)
easyPubMed:::EPM_prep_outfile(test_df, path = '.', prefix = 'my_test_job')

Import PubMed Records Saved Locally in XML Format.

Description

Read the contents of an XML file and import Metadata and PubMed records for use by easyPubMed. The XML file must be generated by easyPubMed (ver >= 3) via the 'epm_fetch()' function or via the 'fetchEPMData()' method. XML files downloaded from the Web or using other software are currently unsupported. This function can only process one file.

Usage

EPM_read_xml(x)

Arguments

x

Path to an XML file on the local machine.

Value

List including four elements: 'guide' (data.frame), 'meta' (list), 'job_info' (data.frame) and 'contents' (named list).

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

## Not run: 
  x <- epm_query(query_string = 'easyPubMed', verbose = TRUE)
  x <- epm_fetch(x = x, write_to_file = TRUE, store_contents = FALSE, 
                 outfile_prefix = 'qpm_qry_', verbose = TRUE)
  y <- EPM_read_xml(x = 'qpm_qry__batch_01.txt')
  try(unlink('qpm_qry__batch_01.txt'), silent = TRUE)
  y

## End(Not run)

Parse and Format References.

Description

Extract Reference Information form a raw XML string, typically extracted from a PubMed record. Users can select the type of identifier to extract and return, as well as the maximum number of references to be returned.

Usage

EPM_reference_parse(x, max_references = 100, id_type = "pmid")

Arguments

x

String (character vector of length 1) including a List of references obtained from a PubMed record.

max_references

Numeric (of length 1). Maximum number of references to extract/include. This should be an integer '>=0'.

id_type

String (character vector of length 1). Type of identifier to be used for references. One of the following values is expected: ‘c(’pmid', 'doi', 'pmc')'.

Value

data.frame including one row for each author and n=4 columns: lastname, forename, address and email.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

ref <- paste0('<xml><Reference><Citation>',
              '<ArticleId IdType=\"pubmed\">25822800</ArticleId>',
              '<ArticleId IdType=\"pmc\">PMC4739640</ArticleId>',
              '</Citation></Reference></xml>')
easyPubMed:::EPM_reference_parse(ref)
easyPubMed:::EPM_reference_parse(ref, id_type = 'pmc')

Submit a Query and Retrieve Results from PubMed.

Description

Submit a Query to the NCBI ESearch Server, capture the response and retrieve the corresponding PubMed records from the NCBI EFetch Server. Up to the first n=10,000 records returned by the query will be retrieved (as per the NCBI policy). The operation must be completed within a user-defined timeout window otherwise it will be killed.

Usage

EPM_retrieve_data(
  query_string,
  api_key = NULL,
  format = "xml",
  encoding = "UTF-8",
  timeout = 600,
  batch_size = 500,
  max_restart_attempts = 10
)

Arguments

query_string

String (character vector of length 1), corresponding to the query string.

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be NULL.

format

String (character vector of length 1), desired format of the Pubmed records. This must be one of the values in c("xml", "medline", "uilist").

encoding

String (character vector of length 1), encoding of the resulting records (e.g., "UTF-8").

timeout

Integer, time allowed for completing the operation (in seconds).

batch_size

Integer, max number of records to be retrieved as a batch. This corresponds to the "retmax" NCBI parameter.

max_restart_attempts

Integer, max number of attempts in case of a failed iteration.

Value

Character vector including the response from the server.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  easyPubMed:::EPM_retrieve_data(qry, format = "uilist")
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Submit a Query and Read the Response from the Server.

Description

Submit a request to a server (typically, the Entrez Eutils server) and capture the response.

Usage

EPM_submit_q(qurl)

Arguments

qurl

String (character vector of length 1), corresponding to the query URL to the remote server.

Value

Character vector including the response from the server.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  qry <- paste0("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/", 
                "esearch.fcgi?db=pubmed&term=easyPubMed")
  easyPubMed:::EPM_submit_q(qry)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Validate Parameters of a PubMed Retrieval Job.

Description

Check and correct (if needed) the parameters of an easyPubMed retrieval job.

Usage

EPM_validate_fetch_params(params)

Arguments

params

list of user-provided parameters.

Details

The following elements are expected and/or parsed from the 'params' list:

'encoding'. String, e.g. "UTF-8".
'format'. String, must be one of the following values: ‘c(’uilist', 'medline', 'xml')'.
'store_contents'. Logical, shall retrieved contents be stored in the object. If 'FALSE', the 'write_to_file' argument must be 'TRUE'.
'write_to_file' Logical, shall retrieved contents be written to a file (or list of files). If 'FALSE', the 'store_contents' argument must be 'TRUE'.
'outfile_path'. String, path to the folder where files will be written. This argument is evaluated only if 'write_to_file' is 'TRUE'.
'outfile_prefix'. String, prefix of the files that will be written locally. This argument is evaluated only if 'write_to_file' is 'TRUE'.
'api_key'. String, NCBI API key. Can be NULL.
'max_records_per_batch'. Integer scalar (numeric vector of length 1), this is the maximum number of records retrieved per batch. It deafualts to 10,000.
'verbose'. Logical, shall details about the progress of the operation be printed to console.

Value

list including the vetted parameters.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

prms <- list(
  encoding  = 'UTF-8', 
  format = 'xml', 
  api_key = NULL,
  store_contents = TRUE, 
  write_to_file = FALSE, 
  verbose = TRUE)
easyPubMed:::EPM_validate_fetch_params(prms)

Validate Parameters of a PubMed Record Parsing Job.

Description

Check and correct (if needed) the parameters of an easyPubMed Record Parsing job.

Usage

EPM_validate_parse_params(params)

Arguments

params

list of user-provided parameters.

Details

The following elements are expected and/or parsed from the 'params' list:

'max_authors'. Numeric, maximum number of authors to retrieve. If this is set to -1, only the last author is extracted. If this is set to 1, only the first author is returned. If this is set to 2, the first and the last authors are extracted. If this is set to any other positive number (i), up to the leading (i-1) authors are retrieved together with the last author. If this is set to a number larger than the number of authors in a record, all authors are returned. Note that at least 1 author has to be retrieved, therefore a value of 0 is not accepted (coerced to -1).
'autofill_address'. Logical, shall author affiliations be propagated within each record to fill missing values.
'compact_output'. Logical, shall record data be returned in a compact format where each row is a single record and author names are collapsed together. If 'FALSE', each row corresponds to a single author of the publication and the record-specific data are recycled for all included authors.
'include_abstract'. Logical, shall abstract text be included in the output 'data.frame'.
'max_references'. Numeric, maximum number of references to return (from each PubMed record).
'ref_id_type'. String, must be one of the following values: ‘c(’pmid', 'doi')'.
'verbose'. Logical, shall details about the progress of the operation be printed to console.

Value

list including the vetted parameters.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples


prms <- list(
  max_authors  = 12, 
  autofill_address = TRUE, 
  compact_output = FALSE,
  include_abstract = TRUE, 
  max_references = 100, 
  ref_id_type = 'doi',
  verbose = TRUE)
easyPubMed:::EPM_validate_parse_params(prms)

Write PubMed Records to Local Files.

Description

Write a list of PubMed records to a local file. If already existing, the destination file will be over-written. Original formatting of the PubMed records should be declared and will be preserved in the output file. Format conversion is NOT supported.

Usage

EPM_write_to_file(x, to, format, addon = NULL, verbose = FALSE)

Arguments

x

List including raw PubMed records.

to

Path to the destination file on the local disc.

format

String, format of the raw PubMed records that will be saved to the destination file (e.g., 'xml').

addon

String, optional chunk of text in XML format to be written to the destination file (header). This argument is only used when ‘format' is set to ’xml'. It can be NULL.

verbose

Logical, shall details about the progress of the operation be printed to console.

Value

integer in the range c(0, 1). A result of 0 indicates that an error occurred while writing the file. A result of 1 indicates that the operation was completed successfully.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

test <- list('Record #1', 'Record #2')
outfile = './test_file.txt'
file.exists(outfile)
easyPubMed:::EPM_write_to_file(x = test, to = './test_file.txt', format = 'xml')
file.exists(outfile)
readLines(outfile)
unlink(outfile)

Harmonize the Elements of a Vector by Adding Leading Zeros.

Description

Coerce a vector to character and then harmonize the number of characters (nchar) of each element by adding a suitable number of leading zeroes (or other user-character).

Usage

EPM_zerofill(x, fillchar = "0")

Arguments

x

vector (numeric or character).

fillchar

string corresponding to a single character. This character is going to be added (one or more times) in front of each element of the input vector.

Value

character vector whose elements have all the same size (number of characters).

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Example 1
easyPubMed:::EPM_zerofill(c(1, 100, 1000))
# Example 2
easyPubMed:::EPM_zerofill(c('Hey,', 'hello', 'there!'), '_')

Retrieve Text Between XML Tags

Description

Usage

custom_grep(xml_data, tag, format = "list")

Arguments

xml_data

String (of class character and length 1): corresponds to the PubMed record or any string including XML/HTML tags.

tag

String (of class character and length 1): the tag of interest (does NOT include < > chars).

format

c("list", "char"): specifies the format for the output.

Details

The 'custom_grep()' function is now obsolete. This is a helper function that will be replaced by 'easyPubMed:::EPM_custom_grep()', an internal function that won't be exported. The 'custom_grep()' function will be retired in 2026.

Value

List or vector where each element corresponds to an in-tag substring.

Author(s)

Damiano Fantini damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

try({
  ## extract substrings based on regular expressions
  string_01 <- paste0(
    "The itsy bitsy <strong>spider</strong> ", 
    "Went up the water spout. Down came the rain ", 
    "And <strong>washed the spider out</strong>.")
  print(string_01)
  custom_grep(xml_data = string_01, tag = "strong", format = "char")
  custom_grep(xml_data = string_01, tag = "strong", format = "list")
}, silent = TRUE)

Class easyPubMed.

Description

Class easyPubMed defines objects that represent PubMed Query jobs and the corresponding results. Briefly, these objects are initialized using information that will guide the communication with the NCBI Entrez server. Also, easyPubMed objects are used to store raw and processed data retrieved from Pubmed.

Usage

## S4 method for signature 'easyPubMed'
initialize(.Object, query_string, job_info)

Arguments

.Object

The easyPubMed object being built.

query_string

String (character vector of length 1) corresponding to the user-provided text of the query to be submitted to PubMed.

job_info

List, this should be the output of 'EPM_job_split()'.

Slots

query: String (character vector of length 1) corresponding to the PubMed request submitted by the user.
meta: List including meta information about the PubMed Query job.
uilist: List including all unique identifiers corresponding to the Pubmed records returned by the query. Can be empty.
raw: List including the raw data (in 'xml' or 'medline' format) retrieved from the NCBI eFetch server. Can be empty.
data: Data.frame including processed data based on the xml raw data retrieved from PubMed.
misc: List including additional information.

Author(s)

Damiano Fantini damiano.fantini@gmail.com

Fetch Raw Records from Pubmed.

Description

Fetch raw PubMed records from PubMed. Records can be downloaded in text or xml format and stored into a local object or written to local files.

Usage

epm_fetch(
  x,
  format = "xml",
  api_key = NULL,
  write_to_file = FALSE,
  outfile_path = NULL,
  outfile_prefix = NULL,
  store_contents = TRUE,
  encoding = "UTF-8",
  verbose = TRUE
)

Arguments

x

An 'easyPubMed' object.

format

String, the desired format for the raw records. This argument must take one of the following values: 'c("uilist", "medline", "xml")' and defaults to '"xml"'.

api_key

String, corresponding to the NCBI API token (if available). NCBI token strings can be requested from NCBI. Record download will be faster if a valid NCBI token is used. This argument can be 'NULL'.

write_to_file

Logical of length 1. Shall raw records be written to a file on the local machine. It defaults to 'FALSE'.

outfile_path

Path to the folder on the local machine where files will be saved (if 'write_to_file' is 'TRUE'). It must point to an already existing directory. If 'NULL', the working directory will be used.

outfile_prefix

String, prefix that will be added to the name of each file written to the local machine. This argument is parsed only when 'write_to_file' is 'TRUE'. If 'NULL', an arbitrary prefix will be added (easypubmed_job_YYYYMMDDHHMM).

store_contents

Logical of length 1. Shall raw records be stored in the 'easyPubMed' object. It defaults to 'TRUE'. It may convenient to switch this to 'FALSE' when downloading large number of records. If 'store_contents' is 'FALSE', 'write_to_file' must be 'TRUE'.

encoding

String, the encoding of the records retrieved from PubMed. Typically, this is 'UTF-8'.

verbose

Logical, shall details about the progress of the operation be printed to console.

Value

an easyPubMed object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x = x, format = 'uilist')
  x
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Import PubMed Records from Local Files.

Description

Read one or more text files including XML-decorated raw PubMed records and rebuild an 'easyPubMed' object. The function expects all files to be generated from the same query using 'easyPubMed>3.0' and the 'epm_fetch()' function setting 'write_to_file' to 'TRUE'. This function can import a fraction or all of the files resulting from a single query. Files resulting from non-compatible fetch jobs will be dropped.

Usage

epm_import_xml(x)

Arguments

x

Character vector, the paths to text files including XML-decorated raw PubMed records saved using 'easyPubMed>3.0'.

Value

an 'easyPubMed' object including raw XML PubMed records.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x = x, format = 'xml', write_to_file = TRUE, 
                 outfile_prefix = 'test', store_contents = FALSE)
  y <- epm_import_xml('test_batch_01.txt')
  tryCatch({unlink('test_batch_01.txt')}, error = function(e) { NULL }) 
  print(paste0('       Raw Record Num (fetched): ', 
               getEPMMeta(x)$raw_record_num))
  print(paste0('Raw Record Num (read & rebuilt): ', 
               getEPMMeta(y)$raw_record_num))
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Extract Information from a Raw PubMed Record.

Description

Read a raw PubMed record, identify XML tags, extract information and cast it into a structured data.frame. The expected input is an XML-tag-decorated string corresponding to a single PubMed record. Information about article title, authors, affiliations, journal name and abbreviation, publication date, references, and keywords are returned.

Usage

epm_parse(
  x,
  max_authors = 10,
  autofill_address = TRUE,
  compact_output = TRUE,
  include_abstract = TRUE,
  max_references = 150,
  ref_id_type = "doi",
  verbose = TRUE
)

Arguments

x

An 'easyPubMed' object. The object must include raw records (n>0) downloaded in the 'xml' format.

max_authors

Numeric, maximum number of authors to retrieve. If this is set to -1, only the last author is extracted. If this is set to 1, only the first author is returned. If this is set to 2, the first and the last authors are extracted. If this is set to any other positive number (i), up to the leading (n-1) authors are retrieved together with the last author. If this is set to a number larger than the number of authors in a record, all authors are returned. Note that at least 1 author has to be retrieved, therefore a value of 0 is not accepted (coerced to -1).

autofill_address

Logical, shall author affiliations be propagated within each record to fill missing values.

compact_output

Logical, shall record data be returned in a compact format where each row is a single record and author names are collapsed together. If 'FALSE', each row corresponds to a single author of the publication and the record-specific data are recycled for all included authors (legacy approach).

include_abstract

Logical, shall abstract text be included in the output data.frame. If 'FALSE', the abstract text column is populated with a missing value.

max_references

Numeric, maximum number of references to return (for each PubMed record).

ref_id_type

String, must be one of the following values: ‘c(’pmid', 'doi')'. Type of identifier used to describe citation references.

verbose

Logical, shall details about the progress of the operation be printed to console.

Value

an easyPubMed object including a data.frame ('data' slot) that stores information extracted from its raw XML PubMed records.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.7)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x = x, format = 'xml')
  x <- epm_parse(x, include_abstract = FALSE, max_authors = 1)
  get_epm_data(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Extract Information from a Raw PubMed Record.

Description

Read a raw PubMed record, identify XML tags, extract information and cast it into a structured 'data.frame'. The expected input is an XML-tag-decorated string corresponding to a single PubMed record. Information about article title, authors, affiliations, journal name and abbreviation, publication date, references, and keywords are returned.

Usage

epm_parse_record(
  pubmedArticle,
  max_authors = 15,
  autofill_address = TRUE,
  compact_output = TRUE,
  include_abstract = TRUE,
  max_references = 1000,
  ref_id_type = "pmid"
)

Arguments

pubmedArticle

String, this is an XML-tag-decorated raw PubMed record.

max_authors

autofill_address

Logical, shall author affiliations be propagated within each record to fill missing values.

compact_output

include_abstract

Logical, shall abstract text be included in the output data.frame. If 'FALSE', the abstract text column is populated with a missing value.

max_references

Numeric, maximum number of references to return (for each PubMed record).

ref_id_type

String, must be one of the following values: ‘c(’pmid', 'doi')'. Type of identifier used to describe citation references.

Value

a data.frame including information extracted from a raw XML PubMed record.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

data(epm_samples)
x <- epm_samples$bladder_cancer_2018$demo_data_03$raw[[1]]
epm_parse_record(x)

Search for PubMed Records.

Description

Query PubMed (Entrez) via the PubMed API eSearch utility. Calling this function results in submitting a query to the NCBI EUtils server and then capturing and parsing the response. The number of records expected to be returned by the query is determined. If this number is bigger than n=10,000, the record retrieval job is automatically split in a list of smaller manageable sub-queries. This function returns an "easyPubMed" object, which includes all information required to retrieve PubMed records using the epm_fetch() function.

Usage

epm_query(query_string, api_key = NULL, verbose = TRUE)

Arguments

query_string

String (character vector of length 1), corresponding to the query string.

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be 'NULL'.

verbose

logical, shall progress information be printed to console. Defaults to 'TRUE'.

Details

This function will use "query_string" for querying PubMed. The Query Term can include one or multiple words, as well as the standard PubMed operators (AND, OR, NOT) and tags (i.e., [AU], [PDAT], [Affiliation], and so on).

Value

An easyPubMed object which includes no PubMed records.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  epm_query(query_string = qry, verbose = FALSE)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Query PubMed by Full-length Title.

Description

Execute a PubMed query using a full-length publication title as query string. Tokenization and stopword removal is automatically performed. The goal is to mimic a Pubmed citation matching search. Because of this approach, it is possible that a query by full-length title may return more than one record.

Usage

epm_query_by_fulltitle(
  fulltitle,
  field = "[Title]",
  api_key = NULL,
  verbose = TRUE
)

Arguments

fulltitle

String (character vector of length 1) that corresponds to the full-length publication title used for querying PubMed (titles should be used as is, without adding trailing filter tags).

field

String (character vector of length 1). This indicates the PubMed record field where the full-length string (fulltitle) should be searched in. By default, this points to the 'Title' field. However, the field can be changed (always use fields supported by PubMed) as required by the user (for example, to attempt an exact-match query using a specific sentence included in the abstract of a record).

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be 'NULL'.

verbose

Logical, shall details about the progress of the operation be printed to console.

Value

an easyPubMed object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  q <- 'Analysis of Mutational Signatures Using the mutSignatures R Library.'
  epm_query_by_fulltitle(q)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Query PubMed by PMIDs.

Description

Query PubMed using a list of PubMed record identifiers (PMIDs) as input. The list of identifiers is automatically split into a series of manageable-sized chunks (max n=50 PMIDs per chunk).

Usage

epm_query_by_pmid(pmids, api_key = NULL, verbose = TRUE)

Arguments

pmids

Vector (character or numeric), list of Pubmed record identifiers (PMIDs). Values will be coerced to character.

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be 'NULL'.

verbose

Logical, shall details about the progress of the operation be printed to console.

Value

an easyPubMed object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  my_pmids <- c(34097668, 34097669, 34097670)
  epm_query_by_pmid(my_pmids)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Preprocessed PubMed Records and Data

Description

This dataset includes a collection of sample data obtained from PubMed records and saved in different formats. This dataset is used to demonstrate specific functionalities of the 'easyPubMed' R library. Each element in the 'epm_samples' list corresponds to a different input or intermediate object.

Usage

data("epm_samples")

Format

The dataset is formatted as a list including 4 elements:

* 'bladder_cancer_2018': List of 4

* 'bladder_cancer_40y': List of 1

* 'fx': List of 5

Examples

## Display some contents
data("epm_samples")
# Display Query String used for collecting the data
print(epm_samples$bladder_cancer_2018$demo_data_01)

PubMed Query Stopwords

Description

Collection of 133 Stopwords that can be removed from query strings to improve the accuracy of exact-match PubMed queries.

Usage

data("epm_stopwords")

Format

A character vector including all PubMed stopwords tat are typically filtered out from queries.

Details

Number of stopwords included, n=133.

Examples

## Display some contents
data("epm_stopwords")
head(epm_stopwords)

Method fetchEPMData.

Description

Retrieve PubMed records for an 'easyPubMed' object.

Usage

fetchEPMData(x, params)

## S4 method for signature 'easyPubMed,list'
fetchEPMData(x, params)

Arguments

x

an easyPubMed-class object.

params

list including parameters to tune the record retrieval job. For more info, see '?easyPunMed:::EPM_validate_fetch_params'.

Retrieve PubMed Data in XML or TXT Format

Description

Retrieve PubMed records from Entrez following a search performed via the get_pubmed_ids() function. Data are downloaded in the XML or TXT format and are retrieved in batches of up to 5000 records.

Usage

fetch_pubmed_data(
  pubmed_id_list,
  retstart = 0,
  retmax = 500,
  format = "xml",
  encoding = "UTF8",
  api_key = NULL,
  verbose = TRUE
)

Arguments

pubmed_id_list

An easyPubMed object.

retstart

Integer (>=0): this argument is ignored.

retmax

Integer (>=1): this argument is ignored.

format

String: element specifying the output format. The following values are allowed: c("xml", "medline", "uilist").

encoding

String, the encoding of the records retrieved from Pubmed. This argument is ignored and set to 'UTF-8'.

api_key

String, corresponding to the NCBI API token (if available). NCBI token strings can be requested from NCBI. Record download will be faster if a valid NCBI token is used. This argument can be NULL.

verbose

Logical, shall details about the progress of the operation be printed to console.

Details

The 'fetch_pubmed_data()' function is now obsolete. You should use the 'epm_fetch()' function instead. Please, have a look at the manual or the vignette. The 'fetch_pubmed_data()' function will be retired in 2026.

Value

Character vector of length >= 1. If format is set to "xml" (default), a single String including all PubMed records (decorated with XML tags) is returned. If a different format is selected, a vector of strings is returned, where each element corresponds to a line of the output document.

Author(s)

Damiano Fantini damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/ https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/

Examples

## Example 01: retrieve PubMed record Unique Identifiers (uilist)
# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({ 
  q <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  x <- get_pubmed_ids(pubmed_query_string = q)
  y <- fetch_pubmed_data(x, format = "uilist")
  y
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

## Not run: 
## Example 02: retrieve data in XML format
q <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
x <- epm_query(query_string = q)
y <- fetch_pubmed_data(x, format = "xml")
y

## End(Not run)

Method getEPMData.

Description

Retrieve processed data from an 'easyPubMed' object.

Usage

getEPMData(x)

## S4 method for signature 'easyPubMed'
getEPMData(x)

Arguments

x

an object of class 'easyPubMed'.

Method getEPMJobList.

Description

Retrieve the list of record retrieval sub-jobs from an 'easyPubMed' object. Record retrieval sub-jobs are stored in a 'data.frame' and each row corresponds to an independent non-overlapping PubMed query. This 'data.frame' guides the record retrieval process. The 'data.frame' is obtained from the 'misc' slot of an 'easyPubMed' object.

Usage

getEPMJobList(x)

## S4 method for signature 'easyPubMed'
getEPMJobList(x)

Arguments

x

an object of class 'easyPubMed'.

Method getEPMMeta.

Description

Retrieve meta data from an 'easyPubMed' object.

Usage

getEPMMeta(x)

## S4 method for signature 'easyPubMed'
getEPMMeta(x)

Arguments

x

an object of class 'easyPubMed'.

Method getEPMMisc.

Description

Retrieve miscellaneous information stored in an 'easyPubMed' object.

Usage

getEPMMisc(x)

## S4 method for signature 'easyPubMed'
getEPMMisc(x)

Arguments

x

an object of class 'easyPubMed'.

Method getEPMQuery.

Description

Retrieve the user-provided query string from an 'easyPubMed' object.

Usage

getEPMQuery(x)

## S4 method for signature 'easyPubMed'
getEPMQuery(x)

Arguments

x

an object of class 'easyPubMed'.

Method getEPMRaw.

Description

Retrieve the raw PubMed record data stored in an 'easyPubMed' object.

Usage

getEPMRaw(x)

## S4 method for signature 'easyPubMed'
getEPMRaw(x)

Arguments

x

an object of class 'easyPubMed'.

Method getEPMUilist.

Description

Retrieve the list of unique record identifiers (PMIDs) from an 'easyPubMed' object.

Usage

getEPMUilist(x)

## S4 method for signature 'easyPubMed'
getEPMUilist(x)

Arguments

x

an object of class 'easyPubMed'.

Get Processed Data from an easyPubMed Object.

Description

Obtain Processed Data that were extracted from a list of PubMed records. This is a wrapper function that calls the 'getEPMData()' method. This function returns contents from the 'data' slot.

Usage

get_epm_data(x)

Arguments

x

An 'easyPubMed' object.

Value

a 'data.frame' including processed data from an 'easyPubMed' object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x)
  x <- epm_parse(x, max_references = 5, max_authors = 5)
  get_epm_data(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Get Meta Data from an easyPubMed Object.

Description

Request Meta Data from an 'easyPubMed' object. This is a wrapper function that calls the 'getEPMMeta()' method. This function returns contents from the 'meta' slot.

Usage

get_epm_meta(x)

Arguments

x

An 'easyPubMed' object.

Value

a list including meta data from an 'easyPubMed' object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive. 
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  get_epm_meta(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Get Raw Data from an easyPubMed Object.

Description

Request Raw Data from an 'easyPubMed' object. This is a wrapper function that calls the 'getEPMRaw()' method. This function returns contents from the 'raw' slot.

Usage

get_epm_raw(x)

Arguments

x

An 'easyPubMed' object.

Value

a list including raw data from an 'easyPubMed' object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x)
  get_epm_raw(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Get PubMed Record Identifiers from an easyPubMed Object.

Description

Request the list of unique PubMed Record Identifiers that are contained in an 'easyPubMed' object. This function is a wrapper function calling the 'getEPMUilist()' method. This function returns contents from the 'uilist' slot.

Usage

get_epm_uilist(x)

Arguments

x

An 'easyPubMed' object.

Value

a character vector including a list of unique record identifiers from an 'easyPubMed' object.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  x <- epm_query(query_string = 'Damiano Fantini[AU] AND "2018"[PDAT]')
  x <- epm_fetch(x)
  get_epm_uilist(x)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Simple PubMed Record Search

Description

Query PubMed (Entrez) in a simple way via the PubMed API eSearch function. Calling this function results in posting the query results on the PubMed History Server. This allows later access to the resulting data via the fetch_pubmed_data() function, or other easyPubMed functions. NOTE: this function has become obsolete. You should use the epm_query() function instead. Please, have a look at the manual or the vignette. The get_pubmed_ids() function will be retired in 2026.

Usage

get_pubmed_ids(pubmed_query_string, api_key = NULL)

Arguments

pubmed_query_string

String (character vector of length 1), corresponding to the query string used for querying PubMed.

api_key

String (character vector of length 1), corresponding to the NCBI API key. Can be NULL.

Details

This function will use the String provided as argument for querying PubMed via the eSearch function of the PubMed API. The Query Term can include one or multiple words, as well as the standard PubMed operators (AND, OR, NOT) and tags (i.e., [AU], [PDAT], [Affiliation], and so on). ESearch will post the UIDs resulting from the search operation onto the History server so that they can be used directly in a subsequent fetchPubmedData() call.

Value

An easyPubMed object which includes no PubMed records.

Author(s)

Damiano Fantini, damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  qry <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  get_pubmed_ids(pubmed_query_string = qry)
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Method parseEPMData.

Description

Extract, parse and format information from raw PubMed records stored in an 'easyPubMed' object.

Usage

parseEPMData(x, params)

## S4 method for signature 'easyPubMed,list'
parseEPMData(x, params)

Arguments

x

an easyPubMed-class object

params

list including parameters to tune the record data parsing job. For more info, see '?easyPunMed:::EPM_validate_parse_params'.

Print method of the easyPubMed Class.

Description

Print method of the easyPubMed Class.

Usage

## S4 method for signature 'easyPubMed'
print(x)

Arguments

x

the 'easyPubMed' object being shown.

Method setEPMData.

Description

Attach (or replace) processed data to an 'easyPubMed' object.

Usage

setEPMData(x, y)

## S4 method for signature 'easyPubMed,data.frame'
setEPMData(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

'data.frame' including processed data.

Method setEPMJobList.

Description

Attach (or replace) the list of record retrieval sub-jobs to an 'easyPubMed' object. Record retrieval sub-jobs are stored in a data.frame and each row corresponds to an independent non-overlapping PubMed query. This 'data.frame' guides the record retrieval process. The 'data.frame' is written into the 'misc' slot of an 'easyPubMed' object.

Usage

setEPMJobList(x, y)

## S4 method for signature 'easyPubMed,data.frame'
setEPMJobList(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

'data.frame' including the list of PubMed record retrieaval sub-jobs.

Method setEPMMeta.

Description

Attach (or replace) meta data to an 'easyPubMed' object.

Usage

setEPMMeta(x, y)

## S4 method for signature 'easyPubMed,list'
setEPMMeta(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

list including meta data information.

Method setEPMMisc.

Description

Attach (or replace) miscellaneous information to an 'easyPubMed' object.

Usage

setEPMMisc(x, y)

## S4 method for signature 'easyPubMed,list'
setEPMMisc(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

list including miscellaneous data and information.

Method setEPMQuery.

Description

Attach (or replace) a user-provided query string to an 'easyPubMed' object.

Usage

setEPMQuery(x, y)

## S4 method for signature 'easyPubMed,character'
setEPMQuery(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

string (character vector of length 1) corresponding to a PubMed query string.

Method setEPMRaw.

Description

Attach (or replace) raw PubMed record data to an 'easyPubMed' object.

Usage

setEPMRaw(x, y)

## S4 method for signature 'easyPubMed,list'
setEPMRaw(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

list of PubMed records (raw data).

Method setEPMUilist.

Description

Attach (or replace) the list of unique record identifiers (PMIDs) to an 'easyPubMed' object.

Usage

setEPMUilist(x, y)

## S4 method for signature 'easyPubMed,list'
setEPMUilist(x, y)

Arguments

x

an object of class 'easyPubMed'.

y

list of unique PubMed record identifiers (PMIDs).

Show method of the easyPubMed Class.

Description

Show method of the easyPubMed Class.

Usage

## S4 method for signature 'easyPubMed'
show(object)

Arguments

object

the 'easyPubMed' object being shown.

Extract Publication and Affiliation Data from PubMed Records

Description

Extract Publication Info from PubMed records and cast data into a data.frame where each row corresponds to a different author. It is possible to limit data extraction to first authors or last authors only, or get information about all authors of each PubMed record.

Usage

table_articles_byAuth(
  pubmed_data,
  included_authors = "all",
  max_chars = 500,
  autofill = TRUE,
  dest_file = NULL,
  getKeywords = TRUE,
  encoding = "UTF8"
)

Arguments

pubmed_data

PubMed Data in XML format: typically, an XML file resulting from a batch_pubmed_download() call or an XML object, result of a fetch_pubmed_data() call.

included_authors

Character: c("first", "last", "all"). Only includes information from the first, the last or all authors of a PubMed record.

max_chars

This argument is ignored. In this version of the function, the whole Abstract Text is returned.

autofill

Logical. If TRUE, missing affiliations are imputed according to the available values (from the same article).

dest_file

String (character of length 1). Name of the file that will be written for storing the output. If NULL, no file will be saved.

getKeywords

This argument is ignored. In this version of the function MeSH terms and codes (i.e., keywords) are parsed by default.

encoding

The encoding of an input/output connection can be specified by name (for example, "ASCII", or "UTF-8", in the same way as it would be given to the function base::iconv(). See iconv() help page for how to find out more about encodings that can be used on your platform. Here, we recommend using "UTF-8".

Details

The 'table_articles_byAuth()' function is now obsolete. You should use the 'epm_parse()' function instead. Please, have a look at the manual or the vignette. The 'table_articles_byAuth()' function will be retired in 2026.

Value

Data frame including the following fields: 'c("pmid", "doi", "title", "abstract", "year", "month", "day", "jabbrv", "journal", "keywords", "mesh", "lastname", "firstname", "address", "email")'.

Author(s)

Damiano Fantini damiano.fantini@gmail.com

References

https://www.data-pulse.com/dev_site/easypubmed/

Examples

# Note: a time limit can be set in order to kill the operation when/if 
# the NCBI/Entrez server becomes unresponsive.
setTimeLimit(elapsed = 4.9)
try({
  q0 <- 'Damiano Fantini[AU] AND "2018"[PDAT]'
  q1 <- easyPubMed::get_pubmed_ids(pubmed_query_string = q0)
  q2 <- fetch_pubmed_data(pubmed_id_list = q1)
  df <- table_articles_byAuth(q2, included_authors = 'first')
  df[, c('pmid', 'lastname', 'jabbrv', 'year', 'month', 'day')]
}, silent = TRUE)
setTimeLimit(elapsed = Inf)

Retrieve and Process Scientific Publication Records from Pubmed

Description

Details

Author(s)

References

See Also

Examples

Parse and Format Author Names and Affiliations.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Check Metadata from Imported XML Files.

Description

Usage

Arguments

Value

Author(s)

References

Examples

Custom XML Tag Matching.

Description

Usage

Arguments

Value

Author(s)

References

Examples

Parse and Format a Pubmed Date Field.

Description

Usage

Arguments

Value

Author(s)

References

Examples

Decode an XML String into the Corresponding Metadata.

Description

Usage

Arguments

Value

Author(s)

References

Examples

Detect PubMed Record Identifiers.

Description

Usage

Arguments

Value

Author(s)

References

Examples

Submit a Query to the NCBI EFetch Server.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Encode Metadata to an XML String.

Description

Usage

Arguments

Value

Author(s)

References

Examples

Submit a Query to the NCBI ESearch Server.

Description

Usage

Arguments

Details

Value

Author(s)