Title: | Download and Explore Datasets from UCSC Xena Data Hubs |
Version: | 1.4.8 |
Maintainer: | Shixiang Wang <w_shixiang@163.com> |
Description: | Download and explore datasets from UCSC Xena data hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded. |
License: | GPL-3 |
URL: | https://docs.ropensci.org/UCSCXenaTools/, https://github.com/ropensci/UCSCXenaTools |
BugReports: | https://github.com/ropensci/UCSCXenaTools/issues |
Depends: | R (≥ 3.5) |
Imports: | digest, dplyr, httr, jsonlite, magrittr, methods, readr, rlang, utils |
Suggests: | covr, DT, knitr, prettydoc, rmarkdown, shiny, shinydashboard, testthat (≥ 2.1.0) |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.0 |
NeedsCompilation: | no |
Packaged: | 2022-06-20 07:53:49 UTC; wsx |
Author: | Shixiang Wang |
Repository: | CRAN |
Date/Publication: | 2022-06-20 08:10:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
View Info of Dataset or Cohort at UCSC Xena Website Using Web browser
Description
This will open dataset/cohort link of UCSC Xena in user's default browser.
Usage
XenaBrowse(x, type = c("dataset", "cohort"), multiple = FALSE)
Arguments
x |
a XenaHub object. |
type |
one of "dataset" and "cohort". |
multiple |
if |
Examples
XenaGenerate(subset = XenaHostNames == "tcgaHub") %>%
XenaFilter(filterDatasets = "clinical") %>%
XenaFilter(filterDatasets = "LUAD") -> to_browse
Xena Hub Information
Description
This data.frame
is very useful for selecting datasets fastly and
independent on APIs of UCSC Xena Hubs.
Format
A tibble
.
Source
Generated from UCSC Xena Data Hubs.
Examples
data(XenaData)
str(XenaData)
Get or Update Newest Data Information of UCSC Xena Data Hubs
Description
Get or Update Newest Data Information of UCSC Xena Data Hubs
Usage
XenaDataUpdate(saveTolocal = TRUE)
Arguments
saveTolocal |
logical. Whether save to local R package data directory for permanent use or Not. |
Value
a data.frame
contains all datasets information of Xena.
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
## Not run:
XenaDataUpdate()
XenaDataUpdate(saveTolocal = TRUE)
## End(Not run)
Download Datasets from UCSC Xena Hubs
Description
Avaliable datasets list: https://xenabrowser.net/datapages/
Usage
XenaDownload(
xquery,
destdir = tempdir(),
download_probeMap = FALSE,
trans_slash = FALSE,
force = FALSE,
max_try = 3L,
...
)
Arguments
xquery |
a tibble object generated by XenaQuery function. |
destdir |
specify a location to store download data. Default is system temp directory. |
download_probeMap |
if |
trans_slash |
logical, default is |
force |
logical. if |
max_try |
time limit to try downloading the data. |
... |
other argument to |
Value
a tibble
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
## Not run:
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub")
hosts(xe)
xe_query = XenaQuery(xe)
xe_download = XenaDownload(xe_query)
## End(Not run)
Filter a XenaHub Object
Description
One of main functions in UCSCXenatools. It is used to filter
XenaHub
object according to cohorts, datasets. All datasets can be found
at https://xenabrowser.net/datapages/.
Usage
XenaFilter(
x,
filterCohorts = NULL,
filterDatasets = NULL,
ignore.case = TRUE,
...
)
Arguments
x |
a XenaHub object |
filterCohorts |
default is |
filterDatasets |
default is |
ignore.case |
if |
... |
other arguments except |
Value
a XenaHub
object
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
# operate TCGA datasets
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub")
xe
# get all names of clinical data
xe2 = XenaFilter(xe, filterDatasets = "clinical")
datasets(xe2)
Generate and Subset a XenaHub Object from 'XenaData'
Description
Generate and Subset a XenaHub Object from 'XenaData'
Usage
XenaGenerate(XenaData = UCSCXenaTools::XenaData, subset = TRUE)
Arguments
XenaData |
a |
subset |
logical expression indicating elements or rows to keep. |
Value
a XenaHub object.
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
# 1 get all datasets
XenaGenerate()
# 2 get TCGA BRCA
XenaGenerate(subset = XenaCohorts == "TCGA Breast Cancer (BRCA)")
# 3 get all datasets containing BRCA
XenaGenerate(subset = grepl("BRCA", XenaCohorts))
Generate a XenaHub Object
Description
It is used to generate original
XenaHub
object according to hosts, cohorts, datasets or hostName.
If these arguments not specified, all hosts and corresponding datasets
will be returned as a XenaHub
object. All datasets can be found
at https://xenabrowser.net/datapages/.
Usage
XenaHub(
hosts = xena_default_hosts(),
cohorts = character(),
datasets = character(),
hostName = c("publicHub", "tcgaHub", "gdcHub", "icgcHub", "toilHub",
"pancanAtlasHub", "treehouseHub", "pcawgHub", "atacseqHub", "singlecellHub",
"kidsfirstHub")
)
Arguments
hosts |
a character vector specify UCSC Xena hosts, all available hosts can be
found by |
cohorts |
default is empty character vector, all cohorts will be returned. |
datasets |
default is empty character vector, all datasets will be returned. |
hostName |
name of host, available options can be accessed by |
Value
a XenaHub object
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
## Not run:
#1 query all hosts, cohorts and datasets
xe = XenaHub()
xe
#2 query only TCGA hosts
xe = XenaHub(hostName = "tcgaHub")
xe
hosts(xe) # get hosts
cohorts(xe) # get cohorts
datasets(xe) # get datasets
samples(xe) # get samples
## End(Not run)
Class XenaHub
Description
a S4 class to represent UCSC Xena Data Hubs
Slots
hosts
hosts of data hubs
cohorts
cohorts of data hubs
datasets
datasets of data hubs
Prepare (Load) Downloaded Datasets to R
Description
Prepare (Load) Downloaded Datasets to R
Usage
XenaPrepare(
objects,
objectsName = NULL,
use_chunk = FALSE,
chunk_size = 100,
subset_rows = TRUE,
select_cols = TRUE,
callback = NULL,
comment = "#",
na = c("", "NA", "[Discrepancy]"),
...
)
Arguments
objects |
a object of character vector or data.frame. If |
objectsName |
specify names for elements of return object, i.e. names of list |
use_chunk |
default is |
chunk_size |
the number of rows to include in each chunk |
subset_rows |
logical expression indicating elements or rows to keep:
missing values are taken as false. |
select_cols |
expression, indicating columns to select from a data frame.
'x' can be a representation of data frame you wanna do subset operation,
e.g. |
callback |
a function to call on each chunk, default is |
comment |
a character specify comment rows in files |
na |
a character vectory specify |
... |
other arguments transfer to |
Value
a list contains file data, which in way of tibbles
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
## Not run:
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub")
hosts(xe)
xe_query = XenaQuery(xe)
xe_download = XenaDownload(xe_query)
dat = XenaPrepare(xe_download)
## End(Not run)
Query URL of Datasets before Downloading
Description
Query URL of Datasets before Downloading
Usage
XenaQuery(x)
Arguments
x |
a XenaHub object |
Value
a data.frame
contains hosts, datasets and url
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub")
hosts(xe)
## Not run:
xe_query = XenaQuery(xe)
## End(Not run)
Query ProbeMap URL of Datasets
Description
If dataset has no ProbeMap, it will be ignored.
Usage
XenaQueryProbeMap(x)
Arguments
x |
a XenaHub object |
Value
a data.frame
contains hosts, datasets and url
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub")
hosts(xe)
## Not run:
xe_query = XenaQueryProbeMap(xe)
## End(Not run)
Scan all rows according to user input by a regular expression
Description
XenaScan()
is a function can be used before XenaGenerate()
.
Usage
XenaScan(
XenaData = UCSCXenaTools::XenaData,
pattern = NULL,
ignore.case = TRUE
)
Arguments
XenaData |
a |
pattern |
character string containing a regular expression
(or character string for |
ignore.case |
if |
Value
a data.frame
Examples
x1 <- XenaScan(pattern = "Blood")
x2 <- XenaScan(pattern = "LUNG", ignore.case = FALSE)
x1 %>%
XenaGenerate()
x2 %>%
XenaGenerate()
Xena Shiny App
Description
Xena Shiny App
Usage
XenaShiny()
Get or Check TCGA Available ProjectID, DataType and FileType
Description
Get or Check TCGA Available ProjectID, DataType and FileType
Usage
availTCGA(which = c("all", "ProjectID", "DataType", "FileType"))
Arguments
which |
a character of |
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
availTCGA("all")
Get cohorts of XenaHub object
Description
Get cohorts of XenaHub object
Usage
cohorts(x)
Arguments
x |
a XenaHub object |
Value
a character vector contains cohorts
Examples
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub"); cohorts(xe)
Get datasets of XenaHub object
Description
Get datasets of XenaHub object
Usage
datasets(x)
Arguments
x |
a XenaHub object |
Value
a character vector contains datasets
Examples
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub"); datasets(xe)
Easily Download TCGA Data by Several Options
Description
TCGA is a very useful database and here we provide this function to download TCGA (include TCGA Pancan) datasets in human-friendly way. Users who are not familiar with R operation will benefit from this.
Usage
downloadTCGA(
project = NULL,
data_type = NULL,
file_type = NULL,
destdir = tempdir(),
force = FALSE,
...
)
Arguments
project |
default is |
data_type |
default is |
file_type |
default is |
destdir |
specify a location to store download data. Default is system temp directory. |
force |
logical. if |
... |
other argument to |
Details
All availble information about datasets of TCGA can access vis availTCGA()
and
check with showTCGA()
.
Value
same as XenaDownload()
function result.
Author(s)
Shixiang Wang w_shixiang@163.com
See Also
XenaQuery()
,
XenaFilter()
,
XenaDownload()
,
XenaPrepare()
,
availTCGA()
,
showTCGA()
Examples
## Not run:
# download RNASeq data (use UVM as example)
downloadTCGA(project = "UVM",
data_type = "Gene Expression RNASeq",
file_type = "IlluminaHiSeq RNASeqV2")
## End(Not run)
Fetch Data from UCSC Xena Hosts
Description
When you want to query just data for several genes/samples from UCSC Xena datasets, a better way
is to use these fetch_
functions instead of downloading a whole dataset. Details about functions
please see the following sections.
Usage
fetch(host, dataset)
fetch_dense_values(
host,
dataset,
identifiers = NULL,
samples = NULL,
check = TRUE,
use_probeMap = FALSE,
time_limit = 30
)
fetch_sparse_values(host, dataset, genes, samples = NULL, time_limit = 30)
fetch_dataset_samples(host, dataset, limit = NULL)
fetch_dataset_identifiers(host, dataset)
has_probeMap(host, dataset, return_url = FALSE)
Arguments
host |
a UCSC Xena host, like "https://toil.xenahubs.net".
All available hosts can be printed by |
dataset |
a UCSC Xena dataset, like "tcga_RSEM_gene_tpm".
All available datasets can be printed by running |
identifiers |
Identifiers could be probe (like "ENSG00000000419.12"),
gene (like "TP53") etc.. If it is |
samples |
ID of samples, like "TCGA-02-0047-01".
If it is |
check |
if |
use_probeMap |
if |
time_limit |
time limit for getting response in seconds. |
genes |
gene names. |
limit |
number of samples, if |
return_url |
if |
Details
There are three primary data types: dense matrix (samples by probes (or say identifiers)), sparse (sample, position, variant), and segmented (sample, position, value).
Dense matrices can be genotypic or phenotypic, it is a sample-by-identifiers matrix. Phenotypic matrices have associated field metadata (descriptive names, codes, etc.). Genotypic matricies may have an associated probeMap, which maps probes to genomic locations. If a matrix has hugo probeMap, the probes themselves are gene names. Otherwise, a probeMap is used to map a gene location to a set of probes.
Value
a matirx
or character vector or a list
.
Functions
-
fetch_dense_values
: fetches values from a dense matrix. -
fetch_sparse_values
: fetches values from a sparsedata.frame
. -
fetch_dataset_samples
: fetches samples from a dataset -
fetch_dataset_identifiers
: fetches identifies from a dataset. -
has_probeMap
: checks if a dataset has ProbeMap.
Examples
library(UCSCXenaTools)
host <- "https://toil.xenahubs.net"
dataset <- "tcga_RSEM_gene_tpm"
samples <- c("TCGA-02-0047-01", "TCGA-02-0055-01", "TCGA-02-2483-01", "TCGA-02-2485-01")
probes <- c("ENSG00000282740.1", "ENSG00000000005.5", "ENSG00000000419.12")
genes <- c("TP53", "RB1", "PIK3CA")
# Fetch samples
fetch_dataset_samples(host, dataset, 2)
# Fetch identifiers
fetch_dataset_identifiers(host, dataset)
# Fetch expression value by probes
fetch_dense_values(host, dataset, probes, samples, check = FALSE)
# Fetch expression value by gene symbol (if the dataset has probeMap)
has_probeMap(host, dataset)
fetch_dense_values(host, dataset, genes, samples, check = FALSE, use_probeMap = TRUE)
Get TCGA Common Data Sets by Project ID and Property
Description
This is the most useful function for user to download common
TCGA datasets, it is similar to getFirehoseData
function in RTCGAToolbox
package.
Usage
getTCGAdata(
project = NULL,
clinical = TRUE,
download = FALSE,
forceDownload = FALSE,
destdir = tempdir(),
mRNASeq = FALSE,
mRNAArray = FALSE,
mRNASeqType = "normalized",
miRNASeq = FALSE,
exonRNASeq = FALSE,
RPPAArray = FALSE,
ReplicateBaseNormalization = FALSE,
Methylation = FALSE,
MethylationType = c("27K", "450K"),
GeneMutation = FALSE,
SomaticMutation = FALSE,
GisticCopyNumber = FALSE,
Gistic2Threshold = TRUE,
CopyNumberSegment = FALSE,
RemoveGermlineCNV = TRUE,
...
)
Arguments
project |
default is |
clinical |
logical. if |
download |
logical. if |
forceDownload |
logical. if |
destdir |
specify a location to store download data. Default is system temp directory. |
mRNASeq |
logical. if |
mRNAArray |
logical. if |
mRNASeqType |
character vector. Can be one, two or three
in |
miRNASeq |
logical. if |
exonRNASeq |
logical. if |
RPPAArray |
logical. if |
ReplicateBaseNormalization |
logical. if |
Methylation |
logical. if |
MethylationType |
character vector. Can be one or two in |
GeneMutation |
logical. if |
SomaticMutation |
logical. if |
GisticCopyNumber |
logical. if |
Gistic2Threshold |
logical. if |
CopyNumberSegment |
logical. if |
RemoveGermlineCNV |
logical. if |
... |
other argument to |
Details
TCGA Common Data Sets are frequently used for biological analysis.
To make easier to achieve these data, this function provide really easy
options to choose datasets and behavior. All availble information about
datasets of TCGA can access vis availTCGA()
and check with showTCGA()
.
Value
if download=TRUE
, return data.frame
from XenaDownload
,
otherwise return a list including XenaHub
object and datasets information
Author(s)
Shixiang Wang w_shixiang@163.com
Examples
###### get data, but not download
# 1 choose project and data types you wanna download
getTCGAdata(project = "LUAD", mRNASeq = TRUE, mRNAArray = TRUE,
mRNASeqType = "normalized", miRNASeq = TRUE, exonRNASeq = TRUE,
RPPAArray = TRUE, Methylation = TRUE, MethylationType = "450K",
GeneMutation = TRUE, SomaticMutation = TRUE)
# 2 only choose 'LUAD' and its clinical data
getTCGAdata(project = "LUAD")
## Not run:
###### download datasets
# 3 download clinical datasets of LUAD and LUSC
getTCGAdata(project = c("LUAD", "LUSC"), clinical = TRUE, download = TRUE)
# 4 download clinical, RPPA and gene mutation datasets of LUAD and LUSC
# getTCGAdata(project = c("LUAD", "LUSC"), clinical = TRUE, RPPAArray = TRUE, GeneMutation = TRUE)
## End(Not run)
Get hosts of XenaHub object
Description
Get hosts of XenaHub object
Usage
hosts(x)
Arguments
x |
a XenaHub object |
Value
a character vector contains hosts
Examples
xe = XenaGenerate(subset = XenaHostNames == "tcgaHub"); hosts(xe)
Get Samples of a XenaHub object according to 'by' and 'how' action arguments
Description
One is often interested in identifying samples or features present in each data set, or shared by all data sets, or present in any of several data sets. Identifying these samples, including samples in arbitrarily chosen data sets.
Usage
samples(
x,
i = character(),
by = c("hosts", "cohorts", "datasets"),
how = c("each", "any", "all")
)
Arguments
x |
a XenaHub object |
i |
default is a empty character, it is used to specify
the host, cohort or dataset by |
by |
a character specify |
how |
a character specify |
Value
a list include samples
Examples
## Not run:
xe = XenaHub(cohorts = "Cancer Cell Line Encyclopedia (CCLE)")
# samples in each dataset, first host
x = samples(xe, by="datasets", how="each")[[1]]
lengths(x) # data sets in ccle cohort on first (only) host
## End(Not run)
Show TCGA data structure by Project ID or ALL
Description
This can used to check if data type or file type exist in one or more projects by hand.
Usage
showTCGA(project = "all")
Arguments
project |
a character vector. Can be "all" or one or more of TCGA Project IDs. |
Value
a data.frame
including project data structure information.
Author(s)
Shixiang Wang w_shixiang@163.com
See Also
Examples
showTCGA("all")
Convert camel case to snake case
Description
Convert camel case to snake case
Usage
to_snake(name)
Arguments
name |
a character vector |
Value
same length as name
but with snake case
Examples
to_snake("sparseDataRange")
UCSC Xena Default Hosts
Description
Return Xena default hosts
Usage
xena_default_hosts()
Value
A character vector include current defalut hosts
Author(s)
Shixiang Wang w_shixiang@163.com