Type: | Package |
Title: | A Curated Collection of Digestive System and Gastrointestinal Disease Datasets |
Version: | 0.1.0 |
Maintainer: | Renzo Caceres Rossi <arenzocaceresrossi@gmail.com> |
Description: | Provides an extensive and curated collection of datasets related to the digestive system, stomach, intestines, liver, pancreas, and associated diseases. This package includes clinical trials, observational studies, experimental datasets, cohort data, and case series involving gastrointestinal disorders such as gastritis, ulcers, pancreatitis, liver cirrhosis, colon cancer, colorectal conditions, Helicobacter pylori infection, irritable bowel syndrome, intestinal infections, and post-surgical outcomes. The datasets support educational, clinical, and research applications in gastroenterology, public health, epidemiology, and biomedical sciences. Designed for researchers, clinicians, data scientists, students, and educators interested in digestive diseases, the package facilitates reproducible analysis, modeling, and hypothesis testing using real-world and historical data. |
License: | GPL-3 |
Language: | en |
URL: | https://github.com/lightbluetitan/digestivedatasets, https://lightbluetitan.github.io/digestivedatasets/ |
BugReports: | https://github.com/lightbluetitan/digestivedatasets/issues |
Encoding: | UTF-8 |
LazyData: | true |
Suggests: | ggplot2, testthat (≥ 3.0.0), dplyr, knitr, rmarkdown |
Depends: | R (≥ 4.1.0) |
Imports: | utils |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-05-31 03:55:22 UTC; renzocrossi |
Author: | Renzo Caceres Rossi [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-06-03 13:00:13 UTC |
DigestiveDataSets: A Curated Collection of Digestive System and Gastrointestinal Disease Datasets
Description
This package provides a wide variety of datasets focused on the digestive system, stomach, intestines, liver, pancreas, and associated diseases, including clinical trials, observational studies, experimental datasets, cohort data, and case series involving gastrointestinal disorders such as gastritis, ulcers, pancreatitis, liver cirrhosis, colon cancer, colorectal conditions, Helicobacter pylori infection, irritable bowel syndrome, intestinal infections, and post-surgical outcomes.
Details
DigestiveDataSets: A Curated Collection of Digestive System and Gastrointestinal Disease Datasets
A Curated Collection of Digestive System and Gastrointestinal Disease Datasets.
Author(s)
Maintainer: Renzo Caceres Rossi arenzocaceresrossi@gmail.com
See Also
Useful links:
Anorexia Weight Change
Description
This dataset, anorexia_weight_change_df, is a data frame containing weight change data for young female anorexia patients. It includes pre- and post-treatment weights, along with the type of treatment administered.
Usage
data(anorexia_weight_change_df)
Format
A data frame with 72 observations and 3 variables:
- Treat
Factor indicating the treatment type (3 levels)
- Prewt
Numeric vector indicating the patient's weight before treatment (in kilograms)
- Postwt
Numeric vector indicating the patient's weight after treatment (in kilograms)
Details
The dataset name has been kept as 'anorexia_weight_change_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the MASS package version 7.3-65.
Recurrent Bleeding from Ulcers
Description
This dataset, bleeding_ulcers_df, is a data frame containing data from 40 experiments designed to compare a new surgery for stomach ulcer with an older surgery.
Usage
data(bleeding_ulcers_df)
Format
A data frame with 80 observations and 9 variables:
- author
Factor indicating the author of the study (20 levels)
- year
Integer indicating the year of the study
- quality
Integer representing the quality score of the experiment
- age
Integer indicating the age of the patients
- r
Integer indicating the number of recurrent bleeds
- m
Integer indicating the total number of patients
- bleed
Integer indicating bleeding events
- treat
Factor indicating treatment type (6 levels)
- table
Factor representing the experiment table (40 levels)
Details
The dataset name has been kept as 'bleeding_ulcers_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the SMPracticals package version 1.4-3.1.
Campylobacter Infections Time Series
Description
This dataset, campylobacter_infections_ts, is a time series object containing the number of cases of campylobacter infections in northern Quebec (Canada), recorded in four-week intervals from January 1990 to October 2000. Campylobacterosis is an acute bacterial infectious disease attacking the digestive system.
Usage
data(campylobacter_infections_ts)
Format
A time series object ('ts') with 140 observations:
- Start
c(1990, 1)
- End
c(2000, 10)
- Frequency
13 (observations per year)
Details
The dataset name has been kept as 'campylobacter_infections_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series object. The original content has not been modified in any way.
Source
Data taken from the tscount package version 1.4.3. Original source: Ferland, R., Latour, A. and Oraichi, D., "Integer-valued GARCH process". Journal of Time Series Analysis, 2006; 27(6): 923–942.
Cholera Daily Deaths in England, 1849
Description
This dataset, cholera_deaths_1849_tbl_df, is a tibble containing daily deaths from Cholera and Diarrhaea in England for each day of the 12 months of 1849. It includes the month, cause of death, day of month, number of deaths, date, and day of week for each observation.
Usage
data(cholera_deaths_1849_tbl_df)
Format
A tibble with 730 observations and 6 variables:
- month
Character indicating the month of observation
- cause_of_death
Factor with 2 levels indicating cause of death (Cholera or Diarrhaea)
- day_of_month
Character indicating the day of the month
- deaths
Numeric value indicating the number of deaths
- date
Date object indicating the exact date
- day_of_week
Ordered factor with 7 levels indicating the day of week
Details
The dataset name has been kept as 'cholera_deaths_1849_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Source
Data taken from the HistData package version 0.9-3. Original source: Bingham P., Verlander, N. Q., Cheal M. J. (2004). "John Snow, William Farr and the 1849 outbreak of cholera that affected London: a reworking of the data highlights the importance of the water supply". Public Health, 118(6), 387–394, Table 2.
Chemotherapy for Stage B/C Colon Cancer
Description
This dataset, colon_stageBC_chemo_df, is a data frame containing data from one of the first successful trials of adjuvant chemotherapy for stage B/C colon cancer. The dataset includes 1858 observations (with two records per patient: one for recurrence and one for death) and 16 clinical variables.
Usage
data(colon_stageBC_chemo_df)
Format
A data frame with 1858 observations and 16 variables:
- id
Numeric patient identifier
- study
Numeric study code
- rx
Factor with 3 levels indicating treatment group
- sex
Numeric gender code
- age
Numeric age in years
- obstruct
Numeric obstruction status
- perfor
Numeric perforation status
- adhere
Numeric adhesion status
- nodes
Numeric count of lymph nodes
- status
Numeric event status
- differ
Numeric differentiation grade
- extent
Numeric tumor extent
- surg
Numeric surgery code
- node4
Numeric node4 status
- time
Numeric follow-up time
- etype
Numeric event type
Details
The dataset name has been kept as 'colon_stageBC_chemo_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the OncoDataSets package version 0.1.0.
Features from Colonoscopic Video
Description
This dataset, colonoscopy_features_tbl_df, is a tibble containing features extracted from 76 colonoscopic videos. Each video was recorded using both White Light (WL) and Narrow Band Imaging (NBI). The dataset includes histology results (classification ground truth), the opinion of endoscopists (4 experts and 3 beginners), and 698 features derived from patients with gastrointestinal lesions.
Usage
data(colonoscopy_features_tbl_df)
Format
A tibble with 76 observations and 7 variables:
- feature 294
Numeric feature extracted from colonoscopic videos
- feature 441
Numeric feature extracted from colonoscopic videos
- feature 472
Numeric feature extracted from colonoscopic videos
- feature 486
Numeric feature extracted from colonoscopic videos
- class_agreement
Numeric score representing agreement among endoscopists
- missinglabel_indicator
Numeric indicator for missing labels
- ground truth
Character string representing the histology-based classification
Details
The dataset name has been kept as 'colonoscopy_features_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Source
Data taken from the gmmsslm package version 1.1.6.
PubMed Data of miRNAs in Colorectal Cancer
Description
This dataset, crc_mirnas_pubmed_tbl_df, is a tibble containing information from PubMed abstracts related to microRNAs (miRNAs) in colorectal cancer. The data provides publication metadata, article abstracts, and associated miRNAs across 508 observations with 8 variables.
Usage
data(crc_mirnas_pubmed_tbl_df)
Format
A tibble with 508 observations and 8 variables:
- PMID
Numeric PubMed identifier
- Year
Numeric publication year
- Title
Character article title
- Abstract
Character full abstract text
- Language
Character publication language
- Type
Character article type
- Topic
Character research topic
- miRNA
Character microRNA identifiers
Details
The dataset name has been kept as 'crc_mirnas_pubmed_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Source
Data taken from the OncoDataSets package version 0.1.0.
Cystic Fibrosis SNP
Description
This dataset, cystic_fibrosis_snps_df, is a data frame containing genetic association data for cystic fibrosis, including a case-control indicator and 23 single nucleotide polymorphisms (SNPs) with specified inter-marker distances. The dataset contains 186 observations across 24 variables.
Usage
data(cystic_fibrosis_snps_df)
Format
A data frame with 186 observations and 24 variables:
- y
Integer case-control indicator
- loc1
Integer SNP genotype at location 1
- loc2
Integer SNP genotype at location 2
- loc3
Integer SNP genotype at location 3
- loc4
Integer SNP genotype at location 4
- loc5
Integer SNP genotype at location 5
- loc6
Integer SNP genotype at location 6
- loc7
Integer SNP genotype at location 7
- loc8
Integer SNP genotype at location 8
- loc9
Integer SNP genotype at location 9
- loc10
Integer SNP genotype at location 10
- loc11
Integer SNP genotype at location 11
- loc12
Integer SNP genotype at location 12
- loc13
Integer SNP genotype at location 13
- loc14
Integer SNP genotype at location 14
- loc15
Integer SNP genotype at location 15
- loc16
Integer SNP genotype at location 16
- loc17
Integer SNP genotype at location 17
- loc18
Integer SNP genotype at location 18
- loc19
Integer SNP genotype at location 19
- loc20
Integer SNP genotype at location 20
- loc21
Integer SNP genotype at location 21
- loc22
Integer SNP genotype at location 22
- loc23
Integer SNP genotype at location 23
Details
The dataset name has been kept as 'cystic_fibrosis_snps_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the gap.datasets package version 0.0.6. Original source: Liu JS, Sabatti C, Teng J, Keats BJB, Risch N (2001). "Bayesian Analysis of Haplotypes for Linkage Disequilibrium Mapping". Genome Research, 11:1716–1724.
Digestive Cancer Survival Times
Description
This dataset, digestive_cancer_survival_df, is a data frame containing survival times (in days) of cancer patients with advanced cancer of the stomach, bronchus, colon, ovary, or breast. All patients included in this dataset received treatment that involved supplemental ascorbate.
Usage
data(digestive_cancer_survival_df)
Format
A data frame with 17 observations and 5 variables:
- stomach
Integer values indicating survival times (in days) for patients with stomach cancer
- bronchus
Integer values indicating survival times (in days) for patients with bronchial cancer
- colon
Integer values indicating survival times (in days) for patients with colon cancer
- ovary
Integer values indicating survival times (in days) for patients with ovarian cancer
- breast
Integer values indicating survival times (in days) for patients with breast cancer
Details
The dataset name has been kept as 'digestive_cancer_survival_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the RbyExample package version 0.0.100.
E. coli Infections Time Series
Description
This dataset, ecoli_infections_df, is a data frame containing the weekly number of reported disease cases caused by Escherichia coli in the state of North Rhine-Westphalia (Germany) from January 2001 to May 2013, excluding cases of EHEC and HUS.
Usage
data(ecoli_infections_df)
Format
A data frame with 646 observations and 3 variables:
- year
Numeric value indicating the year of observation
- week
Numeric value indicating the week of observation
- cases
Numeric value indicating the number of reported E. coli cases
Details
The dataset name has been kept as 'ecoli_infections_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the tscount package version 1.4.3.
Gastric Cancer Clinical Trial
Description
This dataset, gastric_cancer_trial_df, is a data frame containing data from a randomized clinical trial conducted by the Gastrointestinal Tumor Study Group on patients with gastric cancer. It includes survival time, event occurrence, and group assignment.
Usage
data(gastric_cancer_trial_df)
Format
A data frame with 90 observations and 3 variables:
- time
Numeric vector representing survival time
- event
Numeric vector indicating event occurrence (e.g., death or relapse)
- group
Factor with 2 levels representing treatment groups
Details
The dataset name has been kept as 'gastric_cancer_trial_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the package coin version 1.4-3.
Gastrointestinal Damage Prevention
Description
This dataset, gi_damage_prevention_df, is a data frame containing results from four randomised clinical trials on the prevention of gastrointestinal damages by Misoprostol, reported by Lanza et al. (1987–1989).
Usage
data(gi_damage_prevention_df)
Format
A data frame with 198 observations and 3 variables:
- study
Factor indicating the clinical trial (4 levels)
- treatment
Factor indicating the treatment group (2 levels: control or Misoprostol)
- classification
Ordered factor indicating the degree of gastrointestinal damage (5 levels)
Details
The dataset name has been kept as 'gi_damage_prevention_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the HSAUR3 package version 1.0-15.
Helicobacter pylori Infection in Preschoolers
Description
This dataset, helicobacter_children_tbl_df, is a tibble containing the prevalence of Helicobacter pylori infection in preschool children according to parental history of duodenal or gastric ulcer.
Usage
data(helicobacter_children_tbl_df)
Format
A tibble with 863 observations and 2 variables:
- ulcer
Factor with 2 levels indicating parental history of duodenal or gastric ulcer
- infected
Factor with 2 levels indicating Helicobacter pylori infection status
Details
The dataset name has been kept as 'helicobacter_children_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Source
Data taken from the package pubh version 2.0.0.
Colic Horse Surgery
Description
This dataset, horse_colic_surgery_df, is a data frame containing clinical observations of horses with colic, where the primary task is to determine if the lesion requires surgery. The data consists of 300 cases with 31 clinical variables, modified from the original UCI repository version with adjusted factor levels.
Usage
data(horse_colic_surgery_df)
Format
A data frame with 300 observations and 31 variables:
- surgery
Factor with 2 levels indicating surgical requirement
- age
Factor with 1 level (age group)
- hospitalID
Integer hospital identifier
- temp_rectal
Numeric rectal temperature
- pulse
Numeric pulse rate
- respiratory_rate
Numeric respiratory rate
- temp_extreme
Factor with 4 levels (temperature extremes)
- pulse_peripheral
Factor with 4 levels (peripheral pulse)
- capillayr_refill_time
Factor with 3 levels (capillary refill time)
- pain
Numeric pain score
- peristalsis
Numeric peristalsis measure
- abdominal_distension
Numeric distension score
- nasogastric_tube
Numeric tube measure
- nasogastric_reflux
Numeric reflux quantity
- nasogastric_reflux_PH
Numeric reflux pH
- rectal_examination
Numeric exam result
- abdomen
Numeric abdomen assessment
- cell_volume
Numeric cell volume
- protein
Numeric protein level
- abdominocentesis_appearance
Numeric appearance score
- abdomcentesis_protein
Numeric protein measure
- outcome
Factor with 3 levels (outcome status)
- surgical_lesion
Factor with 2 levels (lesion type)
- lesion_type1
Factor with 60 levels (primary lesion type)
- lesion_type2
Integer secondary lesion code
- lesion_type3
Integer tertiary lesion code
- cp_data
Factor with 2 levels (CP data)
- temp_extreme_ordered
Ordered factor with 4 levels (temperature)
- temp_extreme_num
Numeric temperature measure
- mucous_membranes_col
Factor with 6 levels (membrane color)
- mucous_membranes_group
Factor with 5 levels (membrane group)
Details
The dataset name has been kept as 'horse_colic_surgery_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way beyond factor level adjustments.
Source
Data taken from the VIM package version 6.2.2 (originally from UCI repository).
Studies on CAM for Irritable Bowel Syndrome
Description
This dataset, ibs_cam_trials_df, is a data frame containing results from 19 clinical trials examining complementary and alternative medicine (CAM) interventions for irritable bowel syndrome (IBS). The dataset includes 12 variables characterizing each trial and its outcomes.
Usage
data(ibs_cam_trials_df)
Format
A data frame with 19 observations and 12 variables:
- id
Integer trial identifier
- study
Character study name/location
- year
Integer publication year
- country
Character country where study was conducted
- ibs.crit
Character IBS diagnostic criteria used
- days
Integer study duration in days
- visits
Integer number of study visits
- jadad
Integer Jadad score for study quality
- x.a
Integer active treatment events
- n.a
Integer active treatment sample size
- x.p
Integer placebo group events
- n.p
Integer placebo group sample size
Details
The dataset name has been kept as 'ibs_cam_trials_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the metadat package version 1.4-0.
SmartPill Intestinal Transit
Description
This dataset, intestinal_smartpill_df, is a data frame from a prospective cohort study evaluating gastric emptying, small bowel transit time, and total intestinal transit time using a SmartPill motility capsule. The study involved 8 critically ill trauma patients and 87 healthy volunteers. The capsule wirelessly transmitted pH, pressure, and temperature to a recorder attached to each subject's abdomen.
Usage
data(intestinal_smartpill_df)
Format
A data frame with 95 observations and 22 variables:
- Group
Numeric indicator of group membership
- Gender
Numeric indicator of gender
- Race
Numeric code indicating racial background
- Height
Height in centimeters
- Weight
Weight in kilograms
- Age
Age in years
- GE.Time
Gastric emptying time (minutes)
- SB.Time
Small bowel transit time (minutes)
- C.Time
Colon transit time (minutes)
- WG.Time
Whole gut transit time (minutes)
- S.Contractions
Number of contractions in the stomach
- S.Sum.of.Amplitudes
Sum of contraction amplitudes in the stomach
- S.Mean.Peak.Amplitude
Mean peak amplitude in the stomach
- S.Mean.pH
Mean pH level in the stomach
- SB.Contractions
Number of contractions in the small bowel
- SB.Sum.of.Amplitudes
Sum of contraction amplitudes in the small bowel
- SB.Mean.Peak.Amplitude
Mean peak amplitude in the small bowel
- SB.Mean.pH
Mean pH level in the small bowel
- Colon.Contractions
Number of contractions in the colon
- Colon.Sum.of.Amplitudes
Sum of contraction amplitudes in the colon
- C.Mean.Peak.Amplitude
Mean peak amplitude in the colon
- C.Mean.pH
Mean pH level in the colon
Details
The dataset name has been kept as 'intestinal_smartpill_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the medicaldata package version 0.2.0. Original source: Rauch et al., "Use of Wireless Utility Capsule to Determine Gastric Emptying and Small Intestinal Transit Times in Critically Ill Trauma Patients". Journal of Critical Care, 2012; 27(5): 534.e7–534.e12.
Satellite Tumors in GI Surgery
Description
This dataset, intestinal_surgery_df, is a data frame containing intestinal surgery data from 844 cancer patients. The data consists of pairs (n_i, s_i) where n_i is the number of satellites removed and s_i is the number of satellites found to be malignant.
Usage
data(intestinal_surgery_df)
Format
A data frame with 844 observations and 2 variables:
- n
Numeric value representing the number of satellites removed
- s
Numeric value representing the number of malignant satellites found
Details
The dataset name has been kept as 'intestinal_surgery_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the deconvolveR package version 1.2-1. Original source: Efron, B. (2016). "Empirical Bayes deconvolution estimates". Biometrika, 103(1), 1–20.
Prednisone vs Placebo in Liver Cirrhosis
Description
This dataset, liver_cirrhosis_prednisone_df, is a data frame containing data from a randomized control trial comparing prednisone (n=251) versus placebo (n=237) in 488 liver cirrhosis patients. The dataset includes both survival and longitudinal measurements of prothrombin index development over time, with 2968 total observations across 9 variables.
Usage
data(liver_cirrhosis_prednisone_df)
Format
A data frame with 2968 observations and 9 variables:
- ID
Integer patient identifier
- Time
Numeric time measurement
- death
Integer death indicator
- obstime
Numeric observation time
- proth
Integer prothrombin index value
- Trt
Factor with 2 levels indicating treatment group (prednisone/placebo)
- start
Numeric start time
- stop
Numeric stop time
- event
Numeric event indicator
Details
The dataset name has been kept as 'liver_cirrhosis_prednisone_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the JSM package version 1.0.1.
Ontario Lynch Syndrome families
Description
This dataset, lynch_ontario_families_df, is a data frame containing data from 32 Lynch Syndrome families segregating mismatch repair mutations selected from the Ontario Familial Colorectal Cancer Registry. The dataset includes 765 individuals (both probands and relatives) with 11 variables per observation.
Usage
data(lynch_ontario_families_df)
Format
A data frame with 765 observations and 11 variables:
- famID
Integer family identifier
- indID
Integer individual identifier
- fatherID
Integer father's identifier
- motherID
Integer mother's identifier
- gender
Integer gender code
- status
Integer disease status
- time
Integer time variable
- currentage
Integer current age
- mgene
Integer mutation gene status
- proband
Integer proband indicator
- relation
Integer relationship code
Details
The dataset name has been kept as 'lynch_ontario_families_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the FamEvent package version 3.2.
Norovirus Outbreak in Derbyshire
Description
This dataset, norovirus_derbyshire_df, is a data frame describing an outbreak of norovirus in the summer of 2001 in a primary school and nursery in Derbyshire, England. It contains 492 observations across 5 variables tracking illness patterns among students.
Usage
data(norovirus_derbyshire_df)
Format
A data frame with 492 observations and 5 variables:
- class
Factor with 15 levels representing school classes
- day_absent
Integer day of absence
- start_illness
Integer day when illness started
- end_illness
Integer day when illness ended
- day_vomiting
Integer day when vomiting occurred
Details
The dataset name has been kept as 'norovirus_derbyshire_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the outbreaks package version 1.9.0. Original source: O'Neill and Marks (2005).
Pancreatic Cancer Clinical Trial
Description
This dataset, pancreatic_cancer_df, is a data frame containing data from a Phase II clinical trial of patients with locally advanced or metastatic pancreatic cancer. It includes time-to-event data for disease progression and death, as well as staging information.
Usage
data(pancreatic_cancer_df)
Format
A data frame with 41 observations and 4 variables:
- stage
Factor indicating disease stage (locally advanced or metastatic)
- onstudy
Factor indicating time (in days) from enrollment
- progression
Factor indicating time (in days) to disease progression
- death
Factor indicating time (in days) to death
Details
The dataset name has been kept as 'pancreatic_cancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the asaur package version 0.50.
Mayo Clinic Primary Biliary Cirrhosis
Description
This dataset, pbc_mayo_survival_df, is a data frame containing data from a randomized control trial conducted at Mayo Clinic from 1974 to 1984, studying the progression of primary biliary cirrhosis. The dataset includes both survival and longitudinal measurements with 1945 observations across 16 clinical variables.
Usage
data(pbc_mayo_survival_df)
Format
A data frame with 1945 observations and 16 variables:
- ID
Integer patient identifier
- Time
Numeric time measurement
- death
Numeric death indicator
- obstime
Numeric observation time
- serBilir
Numeric serum bilirubin measurement
- albumin
Numeric serum albumin measurement
- alkaline
Integer alkaline phosphatase level
- platelets
Integer platelet count
- drug
Factor with 2 levels indicating treatment group
- age
Numeric age in years
- gender
Factor with 2 levels indicating patient sex
- ascites
Factor with 2 levels indicating presence of ascites
- hepatom
Factor with 2 levels indicating presence of hepatomegaly
- start
Numeric start time for interval
- stop
Numeric stop time for interval
- event
Numeric event indicator
Details
The dataset name has been kept as 'pbc_mayo_survival_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the JSM package version 1.0.1.
Indomethacin for Post-ERCP Pancreatitis
Description
This dataset, post_ercp_pancreatitis_tbl_df, is a tibble containing results from a randomized, placebo-controlled, prospective 2-arm trial of rectal indomethacin (100 mg) versus placebo to prevent post-ERCP pancreatitis in 602 participants, as reported by Elmunzer, Higgins, et al. (2012) in the New England Journal of Medicine.
Usage
data(post_ercp_pancreatitis_tbl_df)
Format
A tibble with 602 observations and 33 variables:
- id
Numeric subject identifier
- site
Factor indicating study site (4 levels)
- age
Numeric age of the participant
- risk
Numeric risk score
- gender
Factor indicating gender (2 levels)
- outcome
Factor indicating study outcome (2 levels)
- sod
Factor indicating presence of sphincter of Oddi dysfunction (2 levels)
- pep
Factor indicating presence of post-ERCP pancreatitis (2 levels)
- recpanc
Factor indicating recurrent pancreatitis (2 levels)
- psphinc
Factor indicating pancreatic sphincterotomy (2 levels)
- precut
Factor indicating precut sphincterotomy (2 levels)
- difcan
Factor indicating difficult cannulation (2 levels)
- pneudil
Factor indicating pneumatic dilation (2 levels)
- amp
Factor indicating ampullary interventions (2 levels)
- paninj
Factor indicating pancreatic injury (2 levels)
- acinar
Factor indicating acinarization (2 levels)
- brush
Factor indicating brushing procedures (2 levels)
- asa81
Factor indicating ASA 81 mg use (3 levels)
- asa325
Factor indicating ASA 325 mg use (3 levels)
- asa
Factor indicating ASA status (3 levels)
- prophystent
Factor indicating prophylactic stent placement (2 levels)
- therastent
Factor indicating therapeutic stent use (2 levels)
- pdstent
Factor indicating pancreatic duct stent (2 levels)
- sodsom
Factor indicating somatostatin use for SOD (2 levels)
- bsphinc
Factor indicating biliary sphincterotomy (2 levels)
- bstent
Factor indicating biliary stent (2 levels)
- chole
Factor indicating cholecystectomy (2 levels)
- pbmal
Factor indicating presence of pancreaticobiliary malignancy (2 levels)
- train
Factor indicating if performed by trainee (2 levels)
- status
Factor indicating trial status (2 levels)
- type
Factor indicating procedure type (4 levels)
- rx
Factor indicating treatment group: placebo or indomethacin (2 levels)
- bleed
Numeric bleeding indicator
Details
The dataset name has been kept as 'post_ercp_pancreatitis_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble. The original content has not been modified in any way.
Source
Data taken from the medicaldata package version 0.2.0.
H2 Antagonists in UGIB
Description
This dataset, ugi_bleeding_df, is a data frame containing results from 27 studies examining the effectiveness of histamine H2 antagonists (cimetidine or ranitidine) in treating acute upper gastrointestinal hemorrhage, with 14 variables per study.
Usage
data(ugi_bleeding_df)
Format
A data frame with 27 observations and 14 variables:
- id
Integer study identifier
- trial
Character trial name/location
- year
Integer publication year
- ref
Integer reference number
- trt
Character treatment description
- ctrl
Character control description
- nti
Integer treatment group sample size
- b.xti
Integer treatment group bleeding events
- o.xti
Integer treatment group other events
- d.xti
Integer treatment group deaths
- nci
Integer control group sample size
- b.xci
Integer control group bleeding events
- o.xci
Integer control group other events
- d.xci
Integer control group deaths
Details
The dataset name has been kept as 'ugi_bleeding_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the metadat package version 1.4-0.
View Available Datasets in DigestiveDataSets
Description
This function lists all datasets available in the 'DigestiveDataSets' package. If the 'DigestiveDataSets' package is not loaded, it stops and shows an error message. If no datasets are available, it returns a message and an empty vector.
Usage
view_datasets_digestive()
Value
A character vector with the names of the available datasets. If no datasets are found, it returns an empty character vector.
Examples
if (requireNamespace("DigestiveDataSets", quietly = TRUE)) {
library(DigestiveDataSets)
view_datasets_digestive()
}
Obese Patient Weight Loss Data
Description
This dataset, weight_loss_df, is a data frame containing the weight, in kilograms, of an obese patient measured at 52 time points over an 8-month period as part of a weight rehabilitation programme.
Usage
data(weight_loss_df)
Format
A data frame with 52 observations and 2 variables:
- Days
Integer vector indicating the number of days since the beginning of the programme
- Weight
Numeric vector indicating the weight (in kilograms) of the patient at each time point
Details
The dataset name has been kept as 'weight_loss_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the DigestiveDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the MASS package version 7.3-65.