Title: | R Toolkit for 'Databricks' |
Version: | 0.2.8.1 |
Description: | Collection of utilities that improve using 'Databricks' from R. Primarily functions that wrap specific 'Databricks' APIs (https://docs.databricks.com/api), 'RStudio' connection pane support, quality of life functions to make 'Databricks' simpler to use. |
License: | Apache License (≥ 2) |
Encoding: | UTF-8 |
Depends: | R (≥ 4.1.0) |
Imports: | base64enc, cli, curl, dplyr, glue, httr2 (≥ 1.0.4), ini, jsonlite, nanoarrow, purrr, R6, rlang, tibble, utils |
Suggests: | arrow, testthat (≥ 3.0.0), grid, huxtable, htmltools, knitr, magick, rmarkdown, withr |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
URL: | https://github.com/databrickslabs/brickster |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-07-21 03:18:57 UTC; zachary.davies |
Author: | Zac Davies [aut, cre], Rafi Kurlansik [aut], Databricks [cph, fnd] |
Maintainer: | Zac Davies <zac@databricks.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-21 03:50:01 UTC |
Access Control Request for Group
Description
Access Control Request for Group
Usage
access_control_req_group(
group,
permission_level = c("CAN_MANAGE", "CAN_MANAGE_RUN", "CAN_VIEW")
)
Arguments
group |
Group name. There are two built-in groups: |
permission_level |
Permission level to grant. One of |
See Also
Other Access Control Request Objects:
access_control_req_user()
Access Control Request For User
Description
Access Control Request For User
Usage
access_control_req_user(
user_name,
permission_level = c("CAN_MANAGE", "CAN_MANAGE_RUN", "CAN_VIEW", "IS_OWNER")
)
Arguments
user_name |
Email address for the user. |
permission_level |
Permission level to grant. One of |
See Also
Other Access Control Request Objects:
access_control_req_group()
Access Control Request
Description
Access Control Request
Usage
access_control_request(...)
Arguments
... |
Instances of |
See Also
db_jobs_create()
, db_jobs_reset()
, db_jobs_update()
Add Library Path
Description
Add Library Path
Usage
add_lib_path(path, after, version = FALSE)
Arguments
path |
Directory that will added as location for which packages
are searched. Recursively creates the directory if it doesn't exist. On
Databricks remember to use |
after |
Location at which to append the |
version |
If |
Details
This functions primary use is when using Databricks notebooks or hosted RStudio, however, it works anywhere.
See Also
base::.libPaths()
, remove_lib_path()
AWS Attributes
Description
AWS Attributes
Usage
aws_attributes(
first_on_demand = 1,
availability = c("SPOT_WITH_FALLBACK", "SPOT", "ON_DEMAND"),
zone_id = NULL,
instance_profile_arn = NULL,
spot_bid_price_percent = 100,
ebs_volume_type = c("GENERAL_PURPOSE_SSD", "THROUGHPUT_OPTIMIZED_HDD"),
ebs_volume_count = 1,
ebs_volume_size = NULL,
ebs_volume_iops = NULL,
ebs_volume_throughput = NULL
)
Arguments
first_on_demand |
Number of nodes of the cluster that will be placed on
on-demand instances. If this value is greater than 0, the cluster driver node
will be placed on an on-demand instance. If this value is greater than or
equal to the current cluster size, all nodes will be placed on on-demand
instances. If this value is less than the current cluster size,
|
availability |
One of |
zone_id |
Identifier for the availability zone/datacenter in which the
cluster resides. You have three options: availability zone in same region as
the Databricks deployment, |
instance_profile_arn |
Nodes for this cluster will only be placed on AWS instances with this instance profile. If omitted, nodes will be placed on instances without an instance profile. The instance profile must have previously been added to the Databricks environment by an account administrator. This feature may only be available to certain customer plans. |
spot_bid_price_percent |
The max price for AWS spot instances, as a percentage of the corresponding instance type’s on-demand price. For example, if this field is set to 50, and the cluster needs a new i3.xlarge spot instance, then the max price is half of the price of on-demand i3.xlarge instances. Similarly, if this field is set to 200, the max price is twice the price of on-demand i3.xlarge instances. If not specified, the default value is 100. When spot instances are requested for this cluster, only spot instances whose max price percentage matches this field will be considered. For safety, we enforce this field to be no more than 10000. |
ebs_volume_type |
Either |
ebs_volume_count |
The number of volumes launched for each instance. You
can choose up to 10 volumes. This feature is only enabled for supported node
types. Legacy node types cannot specify custom EBS volumes. For node types
with no instance store, at least one EBS volume needs to be specified;
otherwise, cluster creation will fail. These EBS volumes will be mounted at
If EBS volumes are attached, Databricks will configure Spark to use only the EBS volumes for scratch storage because heterogeneously sized scratch devices can lead to inefficient disk utilization. If no EBS volumes are attached, Databricks will configure Spark to use instance store volumes. If EBS volumes are specified, then the Spark configuration |
ebs_volume_size |
The size of each EBS volume (in Custom EBS volumes cannot be specified for the legacy node types (memory-optimized and compute-optimized). |
ebs_volume_iops |
The number of IOPS per EBS gp3 volume. This value must be between 3000 and 16000. The value of IOPS and throughput is calculated based on AWS documentation to match the maximum performance of a gp2 volume with the same volume size. |
ebs_volume_throughput |
The throughput per EBS gp3 volume, in |
Details
If ebs_volume_iops
, ebs_volume_throughput
, or both are not specified, the
values will be inferred from the throughput and IOPS of a gp2 volume with the
same disk size, by using the following calculation:
Disk size | IOPS | Throughput |
Greater than 1000 | 3 times the disk size up to 16000 | 250 |
Between 170 and 1000 | 3000 | 250 |
Below 170 | 3000 | 128 |
See Also
db_cluster_create()
, db_cluster_edit()
Other Cloud Attributes:
azure_attributes()
,
gcp_attributes()
Azure Attributes
Description
Azure Attributes
Usage
azure_attributes(
first_on_demand = 1,
availability = c("SPOT_WITH_FALLBACK", "SPOT", "ON_DEMAND"),
spot_bid_max_price = -1
)
Arguments
first_on_demand |
Number of nodes of the cluster that will be placed on
on-demand instances. If this value is greater than 0, the cluster driver node
will be placed on an on-demand instance. If this value is greater than or
equal to the current cluster size, all nodes will be placed on on-demand
instances. If this value is less than the current cluster size,
|
availability |
One of |
spot_bid_max_price |
The max bid price used for Azure spot instances. You can set this to greater than or equal to the current spot price. You can also set this to -1 (the default), which specifies that the instance cannot be evicted on the basis of price. The price for the instance will be the current price for spot instances or the price for a standard instance. You can view historical pricing and eviction rates in the Azure portal. |
See Also
db_cluster_create()
, db_cluster_edit()
Other Cloud Attributes:
aws_attributes()
,
gcp_attributes()
Close Databricks Workspace Connection
Description
Close Databricks Workspace Connection
Usage
close_workspace(host = db_host())
Arguments
host |
Databricks workspace URL, defaults to calling |
Examples
## Not run:
close_workspace(host = db_host())
## End(Not run)
Cluster Autoscale
Description
Range defining the min and max number of cluster workers.
Usage
cluster_autoscale(min_workers, max_workers)
Arguments
min_workers |
The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation. |
max_workers |
The maximum number of workers to which the cluster can
scale up when overloaded. |
See Also
db_cluster_create()
, db_cluster_edit()
Cluster Log Configuration
Description
Path to cluster log.
Usage
cluster_log_conf(dbfs = NULL, s3 = NULL)
Arguments
dbfs |
Instance of |
s3 |
Instance of |
Details
dbfs
and s3
are mutually exclusive, logs can only be sent to
one destination.
See Also
Other Cluster Log Configuration Objects:
dbfs_storage_info()
,
s3_storage_info()
Condition Task
Description
Condition Task
Usage
condition_task(
left,
right,
op = c("EQUAL_TO", "GREATER_THAN", "GREATER_THAN_OR_EQUAL", "LESS_THAN",
"LESS_THAN_OR_EQUAL", "NOT_EQUAL")
)
Arguments
left |
Left operand of the condition task. Either a string value or a job state or parameter reference. |
right |
Right operand of the condition task. Either a string value or a job state or parameter reference. |
op |
Operator, one of |
Details
The task evaluates a condition that can be used to control the execution of other tasks when the condition_task field is present. The condition task does not require a cluster to execute and does not support retries or notifications.
See Also
Other Task Objects:
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Cron Schedule
Description
Cron Schedule
Usage
cron_schedule(
quartz_cron_expression,
timezone_id = "Etc/UTC",
pause_status = c("UNPAUSED", "PAUSED")
)
Arguments
quartz_cron_expression |
Cron expression using Quartz syntax that describes the schedule for a job. See Cron Trigger for details. |
timezone_id |
Java timezone ID. The schedule for a job is resolved with respect to this timezone. See Java TimeZone for details. |
pause_status |
Indicate whether this schedule is paused or not. Either
|
See Also
db_jobs_create()
, db_jobs_reset()
, db_jobs_update()
Cluster Action Helper Function
Description
Cluster Action Helper Function
Usage
db_cluster_action(
cluster_id,
action = c("start", "restart", "delete", "permanent-delete", "pin", "unpin"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
action |
One of |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Create a Cluster
Description
Create a Cluster
Usage
db_cluster_create(
name,
spark_version,
node_type_id,
num_workers = NULL,
autoscale = NULL,
spark_conf = list(),
cloud_attrs = aws_attributes(),
driver_node_type_id = NULL,
custom_tags = list(),
init_scripts = list(),
spark_env_vars = list(),
autotermination_minutes = 120,
log_conf = NULL,
ssh_public_keys = NULL,
driver_instance_pool_id = NULL,
instance_pool_id = NULL,
idempotency_token = NULL,
enable_elastic_disk = TRUE,
apply_policy_default_values = TRUE,
enable_local_disk_encryption = TRUE,
docker_image = NULL,
policy_id = NULL,
kind = c("CLASSIC_PREVIEW"),
data_security_mode = c("NONE", "SINGLE_USER", "USER_ISOLATION", "LEGACY_TABLE_ACL",
"LEGACY_PASSTHROUGH", "LEGACY_SINGLE_USER", "LEGACY_SINGLE_USER_STANDARD",
"DATA_SECURITY_MODE_STANDARD", "DATA_SECURITY_MODE_DEDICATED",
"DATA_SECURITY_MODE_AUTO"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string. |
spark_version |
The runtime version of the cluster. You can retrieve a
list of available runtime versions by using |
node_type_id |
The node type for the worker nodes.
|
num_workers |
Number of worker nodes that this cluster should have. A
cluster has one Spark driver and |
autoscale |
Instance of |
spark_conf |
Named list. An object containing a set of optional,
user-specified Spark configuration key-value pairs. You can also pass in a
string of extra JVM options to the driver and the executors via
|
cloud_attrs |
Attributes related to clusters running on specific cloud
provider. Defaults to |
driver_node_type_id |
The node type of the Spark driver. This field is
optional; if unset, the driver node type will be set as the same value as
|
custom_tags |
Named list. An object containing a set of tags for cluster
resources. Databricks tags all cluster resources with these tags in addition
to |
init_scripts |
Instance of |
spark_env_vars |
Named list. User-specified environment variable
key-value pairs. In order to specify an additional set of
|
autotermination_minutes |
Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 120. |
log_conf |
Instance of |
ssh_public_keys |
List. SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified. |
driver_instance_pool_id |
ID of the instance pool to use for the
driver node. You must also specify |
instance_pool_id |
ID of the instance pool to use for cluster nodes. If
|
idempotency_token |
An optional token that can be used to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the ID of the existing cluster instead. The existence of a cluster with the same token is not checked against terminated clusters. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one cluster will be launched with that idempotency token. This token should have at most 64 characters. |
enable_elastic_disk |
When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. |
apply_policy_default_values |
Boolean (Default: |
enable_local_disk_encryption |
Boolean (Default: |
docker_image |
Instance of |
policy_id |
String, ID of a cluster policy. |
kind |
The kind of compute described by this compute specification. |
data_security_mode |
Data security mode decides what data governance model to use when accessing data from a cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Create a new Apache Spark cluster. This method acquires new instances from
the cloud provider if necessary. This method is asynchronous; the returned
cluster_id
can be used to poll the cluster state (db_cluster_get()
).
When this method returns, the cluster is in a PENDING
state. The cluster is
usable once it enters a RUNNING
state.
Databricks may not be able to acquire some of the requested nodes, due to cloud provider limitations or transient network issues. If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.
Cannot specify both autoscale
and num_workers
, must choose one.
See Also
Other Clusters API:
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Delete/Terminate a Cluster
Description
Delete/Terminate a Cluster
Usage
db_cluster_delete(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
The cluster must be in the RUNNING
state.
Edit a Cluster
Description
Edit the configuration of a cluster to match the provided attributes and size.
Usage
db_cluster_edit(
cluster_id,
spark_version,
node_type_id,
num_workers = NULL,
autoscale = NULL,
name = NULL,
spark_conf = NULL,
cloud_attrs = NULL,
driver_node_type_id = NULL,
custom_tags = NULL,
init_scripts = NULL,
spark_env_vars = NULL,
autotermination_minutes = NULL,
log_conf = NULL,
ssh_public_keys = NULL,
driver_instance_pool_id = NULL,
instance_pool_id = NULL,
idempotency_token = NULL,
enable_elastic_disk = NULL,
apply_policy_default_values = NULL,
enable_local_disk_encryption = NULL,
docker_image = NULL,
policy_id = NULL,
kind = c("CLASSIC_PREVIEW"),
data_security_mode = c("NONE", "SINGLE_USER", "USER_ISOLATION", "LEGACY_TABLE_ACL",
"LEGACY_PASSTHROUGH", "LEGACY_SINGLE_USER", "LEGACY_SINGLE_USER_STANDARD",
"DATA_SECURITY_MODE_STANDARD", "DATA_SECURITY_MODE_DEDICATED",
"DATA_SECURITY_MODE_AUTO"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
spark_version |
The runtime version of the cluster. You can retrieve a
list of available runtime versions by using |
node_type_id |
The node type for the worker nodes.
|
num_workers |
Number of worker nodes that this cluster should have. A
cluster has one Spark driver and |
autoscale |
Instance of |
name |
Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string. |
spark_conf |
Named list. An object containing a set of optional,
user-specified Spark configuration key-value pairs. You can also pass in a
string of extra JVM options to the driver and the executors via
|
cloud_attrs |
Attributes related to clusters running on specific cloud
provider. Defaults to |
driver_node_type_id |
The node type of the Spark driver. This field is
optional; if unset, the driver node type will be set as the same value as
|
custom_tags |
Named list. An object containing a set of tags for cluster
resources. Databricks tags all cluster resources with these tags in addition
to |
init_scripts |
Instance of |
spark_env_vars |
Named list. User-specified environment variable
key-value pairs. In order to specify an additional set of
|
autotermination_minutes |
Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination. Defaults to 120. |
log_conf |
Instance of |
ssh_public_keys |
List. SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified. |
driver_instance_pool_id |
ID of the instance pool to use for the
driver node. You must also specify |
instance_pool_id |
ID of the instance pool to use for cluster nodes. If
|
idempotency_token |
An optional token that can be used to guarantee the idempotency of cluster creation requests. If an active cluster with the provided token already exists, the request will not create a new cluster, but it will return the ID of the existing cluster instead. The existence of a cluster with the same token is not checked against terminated clusters. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one cluster will be launched with that idempotency token. This token should have at most 64 characters. |
enable_elastic_disk |
When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. |
apply_policy_default_values |
Boolean (Default: |
enable_local_disk_encryption |
Boolean (Default: |
docker_image |
Instance of |
policy_id |
String, ID of a cluster policy. |
kind |
The kind of compute described by this compute specification. |
data_security_mode |
Data security mode decides what data governance model to use when accessing data from a cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
You can edit a cluster if it is in a RUNNING
or TERMINATED
state. If you
edit a cluster while it is in a RUNNING
state, it will be restarted so that
the new attributes can take effect. If you edit a cluster while it is in a
TERMINATED
state, it will remain TERMINATED.
The next time it is started
using the clusters/start API, the new attributes will take effect. An attempt
to edit a cluster in any other state will be rejected with an INVALID_STATE
error code.
Clusters created by the Databricks Jobs service cannot be edited.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
List Cluster Activity Events
Description
List Cluster Activity Events
Usage
db_cluster_events(
cluster_id,
start_time = NULL,
end_time = NULL,
event_types = NULL,
order = c("DESC", "ASC"),
offset = 0,
limit = 50,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to retrieve events about. |
start_time |
The start time in epoch milliseconds. If empty, returns events starting from the beginning of time. |
end_time |
The end time in epoch milliseconds. If empty, returns events up to the current time. |
event_types |
List. Optional set of event types to filter by. Default is to return all events. Event Types. |
order |
Either |
offset |
The offset in the result set. Defaults to 0 (no offset). When an offset is specified and the results are requested in descending order, the end_time field is required. |
limit |
Maximum number of events to include in a page of events. Defaults to 50, and maximum allowed value is 500. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Retrieve a list of events about the activity of a cluster. You can retrieve events from active clusters (running, pending, or reconfiguring) and terminated clusters within 30 days of their last termination. This API is paginated. If there are more events to read, the response includes all the parameters necessary to request the next page of events.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Get Details of a Cluster
Description
Get Details of a Cluster
Usage
db_cluster_get(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Retrieve the information for a cluster given its identifier. Clusters can be described while they are running or up to 30 days after they are terminated.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
List Clusters
Description
List Clusters
Usage
db_cluster_list(host = db_host(), token = db_token(), perform_request = TRUE)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Return information about all pinned clusters, active clusters, up to 150 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days.
For example, if there is 1 pinned cluster, 4 active clusters, 45 terminated all-purpose clusters in the past 30 days, and 50 terminated job clusters in the past 30 days, then this API returns:
the 1 pinned cluster
4 active clusters
All 45 terminated all-purpose clusters
The 30 most recently terminated job clusters
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
List Available Cluster Node Types
Description
List Available Cluster Node Types
Usage
db_cluster_list_node_types(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Return a list of supported Spark node types. These node types can be used to launch a cluster.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
List Availability Zones (AWS Only)
Description
List Availability Zones (AWS Only)
Usage
db_cluster_list_zones(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Amazon Web Services (AWS) ONLY! Return a list of availability zones where clusters can be created in (ex: us-west-2a). These zones can be used to launch a cluster.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Permanently Delete a Cluster
Description
Permanently Delete a Cluster
Usage
db_cluster_perm_delete(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If the cluster is running, it is terminated and its resources are asynchronously removed. If the cluster is terminated, then it is immediately removed.
You cannot perform *any action, including retrieve the cluster’s permissions, on a permanently deleted cluster. A permanently deleted cluster is also no longer returned in the cluster list.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Pin a Cluster
Description
Pin a Cluster
Usage
db_cluster_pin(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Ensure that an all-purpose cluster configuration is retained even after a
cluster has been terminated for more than 30 days. Pinning ensures that the
cluster is always returned by db_cluster_list()
. Pinning a cluster that is
already pinned has no effect.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Resize a Cluster
Description
Resize a Cluster
Usage
db_cluster_resize(
cluster_id,
num_workers = NULL,
autoscale = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
num_workers |
Number of worker nodes that this cluster should have. A
cluster has one Spark driver and |
autoscale |
Instance of |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
The cluster must be in the RUNNING
state.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Restart a Cluster
Description
Restart a Cluster
Usage
db_cluster_restart(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
The cluster must be in the RUNNING
state.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
List Available Databricks Runtime Versions
Description
List Available Databricks Runtime Versions
Usage
db_cluster_runtime_versions(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Return the list of available runtime versions. These versions can be used to launch a cluster.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Start a Cluster
Description
Start a Cluster
Usage
db_cluster_start(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Start a terminated cluster given its ID.
This is similar to db_cluster_create()
, except:
The terminated cluster ID and attributes are preserved.
The cluster starts with the last specified cluster size. If the terminated cluster is an autoscaling cluster, the cluster starts with the minimum number of nodes.
If the cluster is in the
RESTARTING
state, a400
error is returned.You cannot start a cluster launched to run a job.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Delete/Terminate a Cluster
Description
Delete/Terminate a Cluster
Usage
db_cluster_terminate(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
The cluster is removed asynchronously. Once the termination has completed,
the cluster will be in the TERMINATED
state. If the cluster is already in a
TERMINATING
or TERMINATED
state, nothing will happen.
Unless a cluster is pinned, 30 days after the cluster is terminated, it is permanently deleted.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_unpin()
,
get_and_start_cluster()
,
get_latest_dbr()
Unpin a Cluster
Description
Unpin a Cluster
Usage
db_cluster_unpin(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Allows the cluster to eventually be removed from the list returned by
db_cluster_list()
. Unpinning a cluster that is not pinned has no effect.
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
get_and_start_cluster()
,
get_latest_dbr()
Cancel a Command
Description
Cancel a Command
Usage
db_context_command_cancel(
cluster_id,
context_id,
command_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
context_id |
The ID of the execution context. |
command_id |
The ID of the command to get information about. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Execution Context API:
db_context_command_parse()
,
db_context_command_run()
,
db_context_command_run_and_wait()
,
db_context_command_status()
,
db_context_create()
,
db_context_destroy()
,
db_context_status()
Parse Command Results
Description
Parse Command Results
Usage
db_context_command_parse(x, language = c("r", "py", "scala", "sql"))
Arguments
x |
command output from |
language |
Value
command results
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_run()
,
db_context_command_run_and_wait()
,
db_context_command_status()
,
db_context_create()
,
db_context_destroy()
,
db_context_status()
Run a Command
Description
Run a Command
Usage
db_context_command_run(
cluster_id,
context_id,
language = c("python", "sql", "scala", "r"),
command = NULL,
command_file = NULL,
options = list(),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
context_id |
The ID of the execution context. |
language |
The language for the context. One of |
command |
The command string to run. |
command_file |
The path to a file containing the command to run. |
options |
Named list of values used downstream. For example, a 'displayRowLimit' override (used in testing). |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_parse()
,
db_context_command_run_and_wait()
,
db_context_command_status()
,
db_context_create()
,
db_context_destroy()
,
db_context_status()
Run a Command and Wait For Results
Description
Run a Command and Wait For Results
Usage
db_context_command_run_and_wait(
cluster_id,
context_id,
language = c("python", "sql", "scala", "r"),
command = NULL,
command_file = NULL,
options = list(),
parse_result = TRUE,
host = db_host(),
token = db_token()
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
context_id |
The ID of the execution context. |
language |
The language for the context. One of |
command |
The command string to run. |
command_file |
The path to a file containing the command to run. |
options |
Named list of values used downstream. For example, a 'displayRowLimit' override (used in testing). |
parse_result |
Boolean, determines if results are parsed automatically. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_parse()
,
db_context_command_run()
,
db_context_command_status()
,
db_context_create()
,
db_context_destroy()
,
db_context_status()
Get Information About a Command
Description
Get Information About a Command
Usage
db_context_command_status(
cluster_id,
context_id,
command_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
context_id |
The ID of the execution context. |
command_id |
The ID of the command to get information about. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_parse()
,
db_context_command_run()
,
db_context_command_run_and_wait()
,
db_context_create()
,
db_context_destroy()
,
db_context_status()
Create an Execution Context
Description
Create an Execution Context
Usage
db_context_create(
cluster_id,
language = c("python", "sql", "scala", "r"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
language |
The language for the context. One of |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_parse()
,
db_context_command_run()
,
db_context_command_run_and_wait()
,
db_context_command_status()
,
db_context_destroy()
,
db_context_status()
Delete an Execution Context
Description
Delete an Execution Context
Usage
db_context_destroy(
cluster_id,
context_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
context_id |
The ID of the execution context. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_parse()
,
db_context_command_run()
,
db_context_command_run_and_wait()
,
db_context_command_status()
,
db_context_create()
,
db_context_status()
Databricks Execution Context Manager (R6 Class)
Description
Databricks Execution Context Manager (R6 Class)
Databricks Execution Context Manager (R6 Class)
Details
db_context_manager()
provides a simple interface to send commands to
Databricks cluster and return the results.
Methods
Public methods
Method new()
Create a new context manager object.
Usage
db_context_manager$new( cluster_id, language = c("r", "py", "scala", "sql", "sh"), host = db_host(), token = db_token() )
Arguments
cluster_id
The ID of the cluster to execute command on.
language
One of
r
,py
,scala
,sql
, orsh
.host
Databricks workspace URL, defaults to calling
db_host()
.token
Databricks workspace token, defaults to calling
db_token()
.
Returns
A new databricks_context_manager
object.
Method close()
Destroy the execution context
Usage
db_context_manager$close()
Method cmd_run()
Execute a command against a Databricks cluster
Usage
db_context_manager$cmd_run(cmd, language = c("r", "py", "scala", "sql", "sh"))
Arguments
cmd
code to execute against Databricks cluster
language
One of
r
,py
,scala
,sql
, orsh
.
Returns
Command results
Method clone()
The objects of this class are cloneable with this method.
Usage
db_context_manager$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Get Information About an Execution Context
Description
Get Information About an Execution Context
Usage
db_context_status(
cluster_id,
context_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
The ID of the cluster to create the context for. |
context_id |
The ID of the execution context. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Execution Context API:
db_context_command_cancel()
,
db_context_command_parse()
,
db_context_command_run()
,
db_context_command_run_and_wait()
,
db_context_command_status()
,
db_context_create()
,
db_context_destroy()
Detect Current Workspaces Cloud
Description
Detect Current Workspaces Cloud
Usage
db_current_cloud(host = db_host(), token = db_token(), perform_request = TRUE)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
String
Get Current User Info
Description
Get Current User Info
Usage
db_current_user(host = db_host(), token = db_token(), perform_request = TRUE)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
list of user metadata
Detect Current Workspace ID
Description
Detect Current Workspace ID
Usage
db_current_workspace_id(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
String
DBFS Add Block
Description
Append a block of data to the stream specified by the input handle.
Usage
db_dbfs_add_block(
handle,
data,
convert_to_raw = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
handle |
Handle on an open stream. |
data |
Either a path for file on local system or a character/raw vector that will be base64-encoded. This has a limit of 1 MB. |
convert_to_raw |
Boolean (Default: |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If the handle does not exist, this call will throw an exception with
RESOURCE_DOES_NOT_EXIST.
If the block of data exceeds 1 MB, this call will throw an exception with
MAX_BLOCK_SIZE_EXCEEDED.
Typical File Upload Flow
Call create and get a handle via
db_dbfs_create()
Make one or more
db_dbfs_add_block()
calls with the handle you haveCall
db_dbfs_close()
with the handle you have
See Also
Other DBFS API:
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS Close
Description
Close the stream specified by the input handle.
Usage
db_dbfs_close(
handle,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
handle |
The handle on an open stream. This field is required. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If the handle does not exist, this call throws an exception with
RESOURCE_DOES_NOT_EXIST.
Value
HTTP Response
Typical File Upload Flow
Call create and get a handle via
db_dbfs_create()
Make one or more
db_dbfs_add_block()
calls with the handle you haveCall
db_dbfs_close()
with the handle you have
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS Create
Description
Open a stream to write to a file and returns a handle to this stream.
Usage
db_dbfs_create(
path,
overwrite = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
overwrite |
Boolean, specifies whether to overwrite existing file or files. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
There is a 10 minute idle timeout on this handle. If a file or directory
already exists on the given path and overwrite is set to FALSE
, this call
throws an exception with RESOURCE_ALREADY_EXISTS.
Value
Handle which should subsequently be passed into db_dbfs_add_block()
and db_dbfs_close()
when writing to a file through a stream.
Typical File Upload Flow
Call create and get a handle via
db_dbfs_create()
Make one or more
db_dbfs_add_block()
calls with the handle you haveCall
db_dbfs_close()
with the handle you have
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS Delete
Description
DBFS Delete
Usage
db_dbfs_delete(
path,
recursive = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
recursive |
Whether or not to recursively delete the directory’s contents. Deleting empty directories can be done without providing the recursive flag. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS Get Status
Description
Get the file information of a file or directory.
Usage
db_dbfs_get_status(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If the file or directory does not exist, this call throws an exception with
RESOURCE_DOES_NOT_EXIST.
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS List
Description
List the contents of a directory, or details of the file.
Usage
db_dbfs_list(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
When calling list on a large directory, the list operation will time out after approximately 60 seconds.
We strongly recommend using list only on
directories containing less than 10K files and discourage using the DBFS REST
API for operations that list more than 10K files. Instead, we recommend that
you perform such operations in the context of a cluster, using the File
system utility (dbutils.fs
), which provides the same functionality without
timing out.
If the file or directory does not exist, this call throws an exception with
RESOURCE_DOES_NOT_EXIST.
Value
data.frame
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS mkdirs
Description
Create the given directory and necessary parent directories if they do not exist.
Usage
db_dbfs_mkdirs(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If there exists a file (not a directory) at any prefix of the input path, this call throws an exception with
RESOURCE_ALREADY_EXISTS.
If this operation fails it may have succeeded in creating some of the necessary parent directories.
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_move()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS Move
Description
Move a file from one location to another location within DBFS.
Usage
db_dbfs_move(
source_path,
destination_path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
source_path |
The source path of the file or directory. The path
should be the absolute DBFS path (for example, |
destination_path |
The destination path of the file or directory. The
path should be the absolute DBFS path (for example,
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If the given source path is a directory, this call always recursively moves all files.
When moving a large number of files, the API call will time out after
approximately 60 seconds, potentially resulting in partially moved data.
Therefore, for operations that move more than 10K files, we strongly
discourage using the DBFS REST API. Instead, we recommend that you perform
such operations in the context of a cluster, using the File system utility
(dbutils.fs
) from a notebook, which provides the same functionality without
timing out.
If the source file does not exist, this call throws an exception with
RESOURCE_DOES_NOT_EXIST.
If there already exists a file in the destination path, this call throws an exception with
RESOURCE_ALREADY_EXISTS.
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_put()
,
db_dbfs_read()
DBFS Put
Description
Upload a file through the use of multipart form post.
Usage
db_dbfs_put(
path,
file = NULL,
contents = NULL,
overwrite = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
file |
Path to a file on local system, takes precedent over |
contents |
String that is base64 encoded. |
overwrite |
Flag (Default: |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Either contents
or file
must be specified. file
takes precedent over
contents
if both are specified.
Mainly used for streaming uploads, but can also be used as a convenient single call for data upload.
The amount of data that can be passed using the contents parameter is limited
to 1 MB if specified as a string (MAX_BLOCK_SIZE_EXCEEDED
is thrown if
exceeded) and 2 GB as a file.
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_read()
DBFS Read
Description
Return the contents of a file.
Usage
db_dbfs_read(
path,
offset = 0,
length = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
The path of the new file. The path should be the absolute DBFS
path (for example |
offset |
Offset to read from in bytes. |
length |
Number of bytes to read starting from the offset. This has a limit of 1 MB, and a default value of 0.5 MB. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If offset + length exceeds the number of bytes in a file, reads contents until the end of file.
If the file does not exist, this call throws an exception with
RESOURCE_DOES_NOT_EXIST.
If the path is a directory, the read length is negative, or if the offset is negative, this call throws an exception with
INVALID_PARAMETER_VALUE.
If the read length exceeds 1 MB, this call throws an exception with
MAX_READ_SIZE_EXCEEDED.
See Also
Other DBFS API:
db_dbfs_add_block()
,
db_dbfs_close()
,
db_dbfs_create()
,
db_dbfs_delete()
,
db_dbfs_get_status()
,
db_dbfs_list()
,
db_dbfs_mkdirs()
,
db_dbfs_move()
,
db_dbfs_put()
Generate/Fetch Databricks Host
Description
If both id
and prefix
are NULL
then the function will check for
the DATABRICKS_HOST
environment variable.
.databrickscfg
will be searched if db_profile
and use_databrickscfg
are set or if
Posit Workbench managed OAuth credentials are detected.
When defining id
and prefix
you do not need to specify the whole URL.
E.g. https://<prefix>.<id>.cloud.databricks.com/
is the form to follow.
Usage
db_host(id = NULL, prefix = NULL, profile = default_config_profile())
Arguments
id |
The workspace string |
prefix |
Workspace prefix |
profile |
Profile to use when fetching from environment variable
(e.g. |
Details
The behaviour is subject to change depending if db_profile
and
use_databrickscfg
options are set.
-
use_databrickscfg
: Boolean (default:FALSE
), determines if credentials are fetched from profile of.databrickscfg
or.Renviron
-
db_profile
: String (default:NULL
), determines profile used..databrickscfg
will automatically be used when Posit Workbench managed OAuth credentials are detected.
See vignette on authentication for more details.
Value
workspace URL
See Also
Other Databricks Authentication Helpers:
db_read_netrc()
,
db_token()
,
db_wsid()
Create Job
Description
Create Job
Usage
db_jobs_create(
name,
tasks,
schedule = NULL,
job_clusters = NULL,
email_notifications = NULL,
timeout_seconds = NULL,
max_concurrent_runs = 1,
access_control_list = NULL,
git_source = NULL,
queue = TRUE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name for the job. |
tasks |
Task specifications to be executed by this job. Use
|
schedule |
Instance of |
job_clusters |
Named list of job cluster specifications (using
|
email_notifications |
Instance of |
timeout_seconds |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
max_concurrent_runs |
Maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This setting affects only new runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
access_control_list |
Instance of |
git_source |
Optional specification for a remote repository containing
the notebooks used by this job's notebook tasks. Instance of |
queue |
If true, enable queueing for the job. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
See Also
job_tasks()
, job_task()
, email_notifications()
,
cron_schedule()
, access_control_request()
, access_control_req_user()
,
access_control_req_group()
, git_source()
Other Jobs API:
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Delete a Job
Description
Delete a Job
Usage
db_jobs_delete(
job_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
job_id |
The canonical identifier of the job. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Get Job Details
Description
Get Job Details
Usage
db_jobs_get(
job_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
job_id |
The canonical identifier of the job. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
List Jobs
Description
List Jobs
Usage
db_jobs_list(
limit = 25,
offset = 0,
expand_tasks = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
limit |
Number of jobs to return. This value must be greater than 0 and less or equal to 25. The default value is 25. If a request specifies a limit of 0, the service instead uses the maximum limit. |
offset |
The offset of the first job to return, relative to the most recently created job. |
expand_tasks |
Whether to include task and cluster details in the response. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Overwrite All Settings For A Job
Description
Overwrite All Settings For A Job
Usage
db_jobs_reset(
job_id,
name,
schedule,
tasks,
job_clusters = NULL,
email_notifications = NULL,
timeout_seconds = NULL,
max_concurrent_runs = 1,
access_control_list = NULL,
git_source = NULL,
queue = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
job_id |
The canonical identifier of the job. |
name |
Name for the job. |
schedule |
Instance of |
tasks |
Task specifications to be executed by this job. Use
|
job_clusters |
Named list of job cluster specifications (using
|
email_notifications |
Instance of |
timeout_seconds |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
max_concurrent_runs |
Maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This setting affects only new runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
access_control_list |
Instance of |
git_source |
Optional specification for a remote repository containing
the notebooks used by this job's notebook tasks. Instance of |
queue |
If true, enable queueing for the job. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Trigger A New Job Run
Description
Trigger A New Job Run
Usage
db_jobs_run_now(
job_id,
jar_params = list(),
notebook_params = list(),
python_params = list(),
spark_submit_params = list(),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
job_id |
The canonical identifier of the job. |
jar_params |
Named list. Parameters are used to invoke the main
function of the main class specified in the Spark JAR task. If not specified
upon run-now, it defaults to an empty list. |
notebook_params |
Named list. Parameters is passed to the notebook
and is accessible through the |
python_params |
Named list. Parameters are passed to Python file as command-line parameters. If specified upon run-now, it would overwrite the parameters specified in job setting. |
spark_submit_params |
Named list. Parameters are passed to spark-submit script as command-line parameters. If specified upon run-now, it would overwrite the parameters specified in job setting. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
-
*_params
parameters cannot exceed 10,000 bytes when serialized to JSON. -
jar_params
andnotebook_params
are mutually exclusive.
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Cancel Job Run
Description
Cancels a run.
Usage
db_jobs_runs_cancel(
run_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
run_id |
The canonical identifier of the run. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
The run is canceled asynchronously, so when this request completes, the run
may still be running. The run are terminated shortly. If the run is already
in a terminal life_cycle_state
, this method is a no-op.
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Delete Job Run
Description
Delete Job Run
Usage
db_jobs_runs_delete(
run_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
run_id |
The canonical identifier of the run. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Export Job Run Output
Description
Export and retrieve the job run task.
Usage
db_jobs_runs_export(
run_id,
views_to_export = c("CODE", "DASHBOARDS", "ALL"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
run_id |
The canonical identifier of the run. |
views_to_export |
Which views to export. One of |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Get Job Run Details
Description
Retrieve the metadata of a run.
Usage
db_jobs_runs_get(
run_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
run_id |
The canonical identifier of the run. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
Get Job Run Output
Description
Get Job Run Output
Usage
db_jobs_runs_get_output(
run_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
run_id |
The canonical identifier of the run. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
,
db_jobs_update()
List Job Runs
Description
List runs in descending order by start time.
Usage
db_jobs_runs_list(
job_id,
active_only = FALSE,
completed_only = FALSE,
offset = 0,
limit = 25,
run_type = c("JOB_RUN", "WORKFLOW_RUN", "SUBMIT_RUN"),
expand_tasks = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
job_id |
The canonical identifier of the job. |
active_only |
Boolean (Default: |
completed_only |
Boolean (Default: |
offset |
The offset of the first job to return, relative to the most recently created job. |
limit |
Number of jobs to return. This value must be greater than 0 and less or equal to 25. The default value is 25. If a request specifies a limit of 0, the service instead uses the maximum limit. |
run_type |
The type of runs to return. One of |
expand_tasks |
Whether to include task and cluster details in the response. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_submit()
,
db_jobs_update()
Create And Trigger A One-Time Run
Description
Create And Trigger A One-Time Run
Usage
db_jobs_runs_submit(
tasks,
run_name,
timeout_seconds = NULL,
idempotency_token = NULL,
access_control_list = NULL,
git_source = NULL,
job_clusters = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
tasks |
Task specifications to be executed by this job. Use
|
run_name |
Name for the run. |
timeout_seconds |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
idempotency_token |
An optional token that can be used to guarantee the idempotency of job run requests. If an active run with the provided token already exists, the request does not create a new run, but returns the ID of the existing run instead. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. |
access_control_list |
Instance of |
git_source |
Optional specification for a remote repository containing
the notebooks used by this job's notebook tasks. Instance of |
job_clusters |
Named list of job cluster specifications (using
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_update()
Partially Update A Job
Description
Partially Update A Job
Usage
db_jobs_update(
job_id,
fields_to_remove = list(),
name = NULL,
schedule = NULL,
tasks = NULL,
job_clusters = NULL,
email_notifications = NULL,
timeout_seconds = NULL,
max_concurrent_runs = NULL,
access_control_list = NULL,
git_source = NULL,
queue = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
job_id |
The canonical identifier of the job. |
fields_to_remove |
Remove top-level fields in the job settings. Removing
nested fields is not supported. This field is optional. Must be a |
name |
Name for the job. |
schedule |
Instance of |
tasks |
Task specifications to be executed by this job. Use
|
job_clusters |
Named list of job cluster specifications (using
|
email_notifications |
Instance of |
timeout_seconds |
An optional timeout applied to each run of this job. The default behavior is to have no timeout. |
max_concurrent_runs |
Maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This setting affects only new runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |
access_control_list |
Instance of |
git_source |
Optional specification for a remote repository containing
the notebooks used by this job's notebook tasks. Instance of |
queue |
If true, enable queueing for the job. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Parameters which are shared with db_jobs_create()
are optional, only
specify those that are changing.
See Also
Other Jobs API:
db_jobs_create()
,
db_jobs_delete()
,
db_jobs_get()
,
db_jobs_list()
,
db_jobs_reset()
,
db_jobs_run_now()
,
db_jobs_runs_cancel()
,
db_jobs_runs_delete()
,
db_jobs_runs_export()
,
db_jobs_runs_get()
,
db_jobs_runs_get_output()
,
db_jobs_runs_list()
,
db_jobs_runs_submit()
Get Status of All Libraries on All Clusters
Description
Get Status of All Libraries on All Clusters
Usage
db_libs_all_cluster_statuses(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
A status will be available for all libraries installed on clusters via the API or the libraries UI as well as libraries set to be installed on all clusters via the libraries UI.
If a library has been set to be installed on all clusters,
is_library_for_all_clusters
will be true, even if the library was
also installed on this specific cluster.
See Also
Other Libraries API:
db_libs_cluster_status()
,
db_libs_install()
,
db_libs_uninstall()
Get Status of Libraries on Cluster
Description
Get Status of Libraries on Cluster
Usage
db_libs_cluster_status(
cluster_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Unique identifier of a Databricks cluster. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Libraries API:
db_libs_all_cluster_statuses()
,
db_libs_install()
,
db_libs_uninstall()
Install Library on Cluster
Description
Install Library on Cluster
Usage
db_libs_install(
cluster_id,
libraries,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Unique identifier of a Databricks cluster. |
libraries |
An object created by |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Installation is asynchronous - it completes in the background after the request.
This call will fail if the cluster is terminated. Installing a wheel library on a cluster is like running the pip command against the wheel file directly on driver and executors.
Installing a wheel library on a cluster is like running the pip command against the wheel file directly on driver and executors. All the dependencies specified in the library setup.py file are installed and this requires the library name to satisfy the wheel file name convention.
The installation on the executors happens only when a new task is launched. With Databricks Runtime 7.1 and below, the installation order of libraries is nondeterministic. For wheel libraries, you can ensure a deterministic installation order by creating a zip file with suffix .wheelhouse.zip that includes all the wheel files.
See Also
lib_egg()
, lib_cran()
, lib_jar()
, lib_maven()
, lib_pypi()
,
lib_whl()
Other Libraries API:
db_libs_all_cluster_statuses()
,
db_libs_cluster_status()
,
db_libs_uninstall()
Uninstall Library on Cluster
Description
Uninstall Library on Cluster
Usage
db_libs_uninstall(
cluster_id,
libraries,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
cluster_id |
Unique identifier of a Databricks cluster. |
libraries |
An object created by |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
The libraries aren’t uninstalled until the cluster is restarted.
Uninstalling libraries that are not installed on the cluster has no impact but is not an error.
See Also
Other Libraries API:
db_libs_all_cluster_statuses()
,
db_libs_cluster_status()
,
db_libs_install()
Approve Model Version Stage Transition Request
Description
Approve Model Version Stage Transition Request
Usage
db_mlflow_model_approve_transition_req(
name,
version,
stage = c("None", "Staging", "Production", "Archived"),
archive_existing_versions = TRUE,
comment = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
stage |
Target stage of the transition. Valid values are: |
archive_existing_versions |
Boolean (Default: |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Delete a Model Version Stage Transition Request
Description
Delete a Model Version Stage Transition Request
Usage
db_mlflow_model_delete_transition_req(
name,
version,
stage = c("None", "Staging", "Production", "Archived"),
creator,
comment = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
stage |
Target stage of the transition. Valid values are: |
creator |
Username of the user who created this request. Of the transition requests matching the specified details, only the one transition created by this user will be deleted. |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Get All Open Stage Transition Requests for the Model Version
Description
Get All Open Stage Transition Requests for the Model Version
Usage
db_mlflow_model_open_transition_reqs(
name,
version,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Reject Model Version Stage Transition Request
Description
Reject Model Version Stage Transition Request
Usage
db_mlflow_model_reject_transition_req(
name,
version,
stage = c("None", "Staging", "Production", "Archived"),
comment = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
stage |
Target stage of the transition. Valid values are: |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Make a Model Version Stage Transition Request
Description
Make a Model Version Stage Transition Request
Usage
db_mlflow_model_transition_req(
name,
version,
stage = c("None", "Staging", "Production", "Archived"),
comment = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
stage |
Target stage of the transition. Valid values are: |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Transition a Model Version's Stage
Description
Transition a Model Version's Stage
Usage
db_mlflow_model_transition_stage(
name,
version,
stage = c("None", "Staging", "Production", "Archived"),
archive_existing_versions = TRUE,
comment = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
stage |
Target stage of the transition. Valid values are: |
archive_existing_versions |
Boolean (Default: |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
This is a Databricks version of the MLflow endpoint that also accepts a comment associated with the transition to be recorded.
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Make a Comment on a Model Version
Description
Make a Comment on a Model Version
Usage
db_mlflow_model_version_comment(
name,
version,
comment,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
version |
Version of the model. |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Delete a Comment on a Model Version
Description
Delete a Comment on a Model Version
Usage
db_mlflow_model_version_comment_delete(
id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
Unique identifier of an activity. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_edit()
,
db_mlflow_registered_model_details()
Edit a Comment on a Model Version
Description
Edit a Comment on a Model Version
Usage
db_mlflow_model_version_comment_edit(
id,
comment,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
Unique identifier of an activity. |
comment |
User-provided comment on the action. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_registered_model_details()
Get Registered Model Details
Description
Get Registered Model Details
Usage
db_mlflow_registered_model_details(
name,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the model. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Model Registry API:
db_mlflow_model_approve_transition_req()
,
db_mlflow_model_delete_transition_req()
,
db_mlflow_model_open_transition_reqs()
,
db_mlflow_model_reject_transition_req()
,
db_mlflow_model_transition_req()
,
db_mlflow_model_transition_stage()
,
db_mlflow_model_version_comment()
,
db_mlflow_model_version_comment_delete()
,
db_mlflow_model_version_comment_edit()
Create OAuth 2.0 Client
Description
Create OAuth 2.0 Client
Usage
db_oauth_client(host = db_host())
Arguments
host |
Databricks workspace URL, defaults to calling |
Details
Creates an OAuth 2.0 Client, support for U2M flows only. May later be extended for account U2M and all M2M flows.
Value
List that contains httr2_oauth_client and relevant auth url
Perform Databricks API Request
Description
Perform Databricks API Request
Usage
db_perform_request(req, ...)
Arguments
req |
|
... |
Parameters passed to |
See Also
Other Request Helpers:
db_req_error_body()
,
db_request()
,
db_request_json()
Create a SQL Query
Description
Create a SQL Query
Usage
db_query_create(
warehouse_id,
query_text,
display_name,
description = NULL,
catalog = NULL,
schema = NULL,
parent_path = NULL,
run_as_mode = c("OWNER", "VIEWER"),
apply_auto_limit = FALSE,
auto_resolve_display_name = TRUE,
tags = list(),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
warehouse_id |
description |
query_text |
Text of the query to be run. |
display_name |
Display name of the query that appears in list views, widget headings, and on the query page. |
description |
General description that conveys additional information about this query such as usage notes. |
catalog |
Name of the catalog where this query will be executed. |
schema |
Name of the schema where this query will be executed. |
parent_path |
Workspace path of the workspace folder containing the object. |
run_as_mode |
Sets the "Run as" role for the object. |
apply_auto_limit |
Whether to apply a 1000 row limit to the query result. |
auto_resolve_display_name |
Automatically resolve query display name conflicts. Otherwise, fail the request if the query's display name conflicts with an existing query's display name. |
tags |
Named list that describes the warehouse. Databricks tags all warehouse resources with these tags. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other SQL Queries API:
db_query_delete()
,
db_query_get()
,
db_query_list()
,
db_query_update()
Delete a SQL Query
Description
Delete a SQL Query
Usage
db_query_delete(
id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
String, ID for the query. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Moves a query to the trash. Trashed queries immediately disappear from searches and list views, and cannot be used for alerts. You can restore a trashed query through the UI. A trashed query is permanently deleted after 30 days.
See Also
Other SQL Queries API:
db_query_create()
,
db_query_get()
,
db_query_list()
,
db_query_update()
Get a SQL Query
Description
Returns the repo with the given repo ID.
Usage
db_query_get(id, host = db_host(), token = db_token(), perform_request = TRUE)
Arguments
id |
String, ID for the query. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other SQL Queries API:
db_query_create()
,
db_query_delete()
,
db_query_list()
,
db_query_update()
List SQL Queries
Description
List SQL Queries
Usage
db_query_list(
page_size = 20,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
page_size |
Integer, number of results to return for each request. |
page_token |
Token used to get the next page of results. If not specified, returns the first page of results as well as a next page token if there are more results. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Gets a list of queries accessible to the user, ordered by creation time. Warning: Calling this API concurrently 10 or more times could result in throttling, service degradation, or a temporary ban.
See Also
Other SQL Queries API:
db_query_create()
,
db_query_delete()
,
db_query_get()
,
db_query_update()
Update a SQL Query
Description
Update a SQL Query
Usage
db_query_update(
id,
warehouse_id = NULL,
query_text = NULL,
display_name = NULL,
description = NULL,
catalog = NULL,
schema = NULL,
parent_path = NULL,
run_as_mode = NULL,
apply_auto_limit = NULL,
auto_resolve_display_name = NULL,
tags = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
Query id |
warehouse_id |
description |
query_text |
Text of the query to be run. |
display_name |
Display name of the query that appears in list views, widget headings, and on the query page. |
description |
General description that conveys additional information about this query such as usage notes. |
catalog |
Name of the catalog where this query will be executed. |
schema |
Name of the schema where this query will be executed. |
parent_path |
Workspace path of the workspace folder containing the object. |
run_as_mode |
Sets the "Run as" role for the object. |
apply_auto_limit |
Whether to apply a 1000 row limit to the query result. |
auto_resolve_display_name |
Automatically resolve query display name conflicts. Otherwise, fail the request if the query's display name conflicts with an existing query's display name. |
tags |
Named list that describes the warehouse. Databricks tags all warehouse resources with these tags. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other SQL Queries API:
db_query_create()
,
db_query_delete()
,
db_query_get()
,
db_query_list()
Read .netrc File
Description
Read .netrc File
Usage
db_read_netrc(path = "~/.netrc")
Arguments
path |
path of |
Value
named list of .netrc
entries
See Also
Other Databricks Authentication Helpers:
db_host()
,
db_token()
,
db_wsid()
Remote REPL to Databricks Cluster
Description
Remote REPL to Databricks Cluster
Usage
db_repl(
cluster_id,
language = c("r", "py", "scala", "sql", "sh"),
host = db_host(),
token = db_token()
)
Arguments
cluster_id |
Cluster Id to create REPL context against. |
language |
for REPL ('r', 'py', 'scala', 'sql', 'sh') are supported. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
Details
db_repl()
will take over the existing console and allow execution of
commands against a Databricks cluster. For RStudio users there are Addins
which can be bound to keyboard shortcuts to improve usability.
Create Repo
Description
Creates a repo in the workspace and links it to the remote Git repo specified.
Usage
db_repo_create(
url,
provider,
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
url |
URL of the Git repository to be linked. |
provider |
Git provider. This field is case-insensitive. The available
Git providers are |
path |
Desired path for the repo in the workspace. Must be in the format
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Repos API:
db_repo_delete()
,
db_repo_get()
,
db_repo_get_all()
,
db_repo_update()
Delete Repo
Description
Deletes the specified repo
Usage
db_repo_delete(
repo_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
repo_id |
The ID for the corresponding repo to access. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Repos API:
db_repo_create()
,
db_repo_get()
,
db_repo_get_all()
,
db_repo_update()
Get Repo
Description
Returns the repo with the given repo ID.
Usage
db_repo_get(
repo_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
repo_id |
The ID for the corresponding repo to access. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Repos API:
db_repo_create()
,
db_repo_delete()
,
db_repo_get_all()
,
db_repo_update()
Get All Repos
Description
Get All Repos
Usage
db_repo_get_all(
path_prefix,
next_page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path_prefix |
Filters repos that have paths starting with the given path prefix. |
next_page_token |
Token used to get the next page of results. If not specified, returns the first page of results as well as a next page token if there are more results. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Returns repos that the calling user has Manage permissions on. Results are paginated with each page containing twenty repos.
See Also
Other Repos API:
db_repo_create()
,
db_repo_delete()
,
db_repo_get()
,
db_repo_update()
Update Repo
Description
Updates the repo to the given branch or tag.
Usage
db_repo_update(
repo_id,
branch = NULL,
tag = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
repo_id |
The ID for the corresponding repo to access. |
branch |
Branch that the local version of the repo is checked out to. |
tag |
Tag that the local version of the repo is checked out to. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Specify either branch
or tag
, not both.
Updating the repo to a tag puts the repo in a detached HEAD state. Before committing new changes, you must update the repo to a branch instead of the detached HEAD.
See Also
Other Repos API:
db_repo_create()
,
db_repo_delete()
,
db_repo_get()
,
db_repo_get_all()
Propagate Databricks API Errors
Description
Propagate Databricks API Errors
Usage
db_req_error_body(resp)
Arguments
resp |
Object with class |
See Also
Other Request Helpers:
db_perform_request()
,
db_request()
,
db_request_json()
Databricks Request Helper
Description
Databricks Request Helper
Usage
db_request(endpoint, method, version = NULL, body = NULL, host, token, ...)
Arguments
endpoint |
Databricks REST API Endpoint |
method |
Passed to |
version |
String, API version of endpoint. E.g. |
body |
Named list, passed to |
host |
Databricks host, defaults to |
token |
Databricks token, defaults to |
... |
Parameters passed on to |
Value
request
See Also
Other Request Helpers:
db_perform_request()
,
db_req_error_body()
,
db_request_json()
Generate Request JSON
Description
Generate Request JSON
Usage
db_request_json(req)
Arguments
req |
a httr2 request, ideally from |
Value
JSON string
See Also
Other Request Helpers:
db_perform_request()
,
db_req_error_body()
,
db_request()
Delete Secret in Secret Scope
Description
Delete Secret in Secret Scope
Usage
db_secrets_delete(
scope,
key,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope that contains the secret to delete. |
key |
Name of the secret to delete. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
You must have WRITE
or MANAGE
permission on the secret scope.
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope or secret exists.Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
List Secrets in Secret Scope
Description
List Secrets in Secret Scope
Usage
db_secrets_list(
scope,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope whose secrets you want to list |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
This is a metadata-only operation; you cannot retrieve secret data using this
API. You must have READ
permission to make this call.
The last_updated_timestamp
returned is in milliseconds since epoch.
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope exists.Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
Put Secret in Secret Scope
Description
Insert a secret under the provided scope with the given name.
Usage
db_secrets_put(
scope,
key,
value,
as_bytes = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope to which the secret will be associated with |
key |
Unique name to identify the secret. |
value |
Contents of the secret to store, must be a string. |
as_bytes |
Boolean (default: |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If a secret already exists with the same name, this command overwrites the existing secret’s value.
The server encrypts the secret using the secret scope’s encryption settings
before storing it. You must have WRITE
or MANAGE
permission on the secret
scope.
The secret key must consist of alphanumeric characters, dashes, underscores, and periods, and cannot exceed 128 characters. The maximum allowed secret value size is 128 KB. The maximum number of secrets in a given scope is 1000.
You can read a secret value only from within a command on a cluster
(for example, through a notebook); there is no API to read a secret value
outside of a cluster. The permission applied is based on who is invoking the
command and you must have at least READ
permission.
The input fields string_value
or bytes_value
specify the type of the
secret, which will determine the value returned when the secret value is
requested. Exactly one must be specified, this function interfaces these
parameters via as_bytes
which defaults to FALSE
.
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope exists.Throws
RESOURCE_LIMIT_EXCEEDED
if maximum number of secrets in scope is exceeded.Throws
INVALID_PARAMETER_VALUE
if the key name or value length is invalid.Throws
PERMISSION_DENIED
if the user does not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
Delete Secret Scope ACL
Description
Delete the given ACL on the given scope.
Usage
db_secrets_scope_acl_delete(
scope,
principal,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope to remove permissions. |
principal |
Principal to remove an existing ACL. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
You must have the MANAGE
permission to invoke this API.
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope, principal, or ACL exists.Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
Get Secret Scope ACL
Description
Get Secret Scope ACL
Usage
db_secrets_scope_acl_get(
scope,
principal,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope to fetch ACL information from. |
principal |
Principal to fetch ACL information from. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
You must have the MANAGE
permission to invoke this
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope exists.Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
List Secret Scope ACL's
Description
List Secret Scope ACL's
Usage
db_secrets_scope_acl_list(
scope,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope to fetch ACL information from. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
You must have the MANAGE
permission to invoke this API.
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope exists.Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
Put ACL on Secret Scope
Description
Put ACL on Secret Scope
Usage
db_secrets_scope_acl_put(
scope,
principal,
permission = c("READ", "WRITE", "MANAGE"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope to apply permissions. |
principal |
Principal to which the permission is applied |
permission |
Permission level applied to the principal. One of |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Create or overwrite the ACL associated with the given principal (user or group) on the specified scope point. In general, a user or group will use the most powerful permission available to them, and permissions are ordered as follows:
-
MANAGE
- Allowed to change ACLs, and read and write to this secret scope. -
WRITE
- Allowed to read and write to this secret scope. -
READ
- Allowed to read this secret scope and list what secrets are available.
You must have the MANAGE
permission to invoke this API.
The principal is a user or group name corresponding to an existing Databricks principal to be granted or revoked access.
Throws
RESOURCE_DOES_NOT_EXIST
if no such secret scope exists.Throws
RESOURCE_ALREADY_EXISTS
if a permission for the principal already exists.Throws
INVALID_PARAMETER_VALUE
if the permission is invalid.Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
Create Secret Scope
Description
Create Secret Scope
Usage
db_secrets_scope_create(
scope,
initial_manage_principal = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Scope name requested by the user. Scope names are unique. |
initial_manage_principal |
The principal that is initially granted
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Create a Databricks-backed secret scope in which secrets are stored in Databricks-managed storage and encrypted with a cloud-based specific encryption key.
The scope name:
Must be unique within a workspace.
Must consist of alphanumeric characters, dashes, underscores, and periods, and may not exceed 128 characters.
The names are considered non-sensitive and are readable by all users in the workspace. A workspace is limited to a maximum of 100 secret scopes.
If initial_manage_principal
is specified, the initial ACL applied to the
scope is applied to the supplied principal (user or group) with MANAGE
permissions. The only supported principal for this option is the group users,
which contains all users in the workspace. If initial_manage_principal
is
not specified, the initial ACL with MANAGE
permission applied to the scope
is assigned to the API request issuer’s user identity.
Throws
RESOURCE_ALREADY_EXISTS
if a scope with the given name already exists.Throws
RESOURCE_LIMIT_EXCEEDED
if maximum number of scopes in the workspace is exceeded.Throws
INVALID_PARAMETER_VALUE
if the scope name is invalid.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_delete()
,
db_secrets_scope_list_all()
Delete Secret Scope
Description
Delete Secret Scope
Usage
db_secrets_scope_delete(
scope,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
scope |
Name of the scope to delete. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Throws
RESOURCE_DOES_NOT_EXIST
if the scope does not exist.Throws
PERMISSION_DENIED
if the user does not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_list_all()
List Secret Scopes
Description
List Secret Scopes
Usage
db_secrets_scope_list_all(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Throws
PERMISSION_DENIED
if you do not have permission to make this API call.
See Also
Other Secrets API:
db_secrets_delete()
,
db_secrets_list()
,
db_secrets_put()
,
db_secrets_scope_acl_delete()
,
db_secrets_scope_acl_get()
,
db_secrets_scope_acl_list()
,
db_secrets_scope_acl_put()
,
db_secrets_scope_create()
,
db_secrets_scope_delete()
Cancel SQL Query
Description
Cancel SQL Query
Usage
db_sql_exec_cancel(
statement_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
statement_id |
String, query execution |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Requests that an executing statement be canceled. Callers must poll for status to see the terminal state.
Read more on Databricks API docs
See Also
Other SQL Execution APIs:
db_sql_exec_query()
,
db_sql_exec_result()
,
db_sql_exec_status()
Poll a Query Until Successful
Description
Poll a Query Until Successful
Usage
db_sql_exec_poll_for_success(statement_id, interval = 1)
Arguments
statement_id |
String, query execution |
interval |
Number of seconds between status checks. |
Execute SQL Query
Description
Execute SQL Query
Usage
db_sql_exec_query(
statement,
warehouse_id,
catalog = NULL,
schema = NULL,
parameters = NULL,
row_limit = NULL,
byte_limit = NULL,
disposition = c("INLINE", "EXTERNAL_LINKS"),
format = c("JSON_ARRAY", "ARROW_STREAM", "CSV"),
wait_timeout = "10s",
on_wait_timeout = c("CONTINUE", "CANCEL"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
statement |
String, the SQL statement to execute. The statement can
optionally be parameterized, see |
warehouse_id |
String, ID of warehouse upon which to execute a statement. |
catalog |
String, sets default catalog for statement execution, similar
to |
schema |
String, sets default schema for statement execution, similar
to |
parameters |
List of Named Lists, parameters to pass into a SQL statement containing parameter markers. A parameter consists of a name, a value, and optionally a type.
To represent a See docs for more details. |
row_limit |
Integer, applies the given row limit to the statement's
result set, but unlike the |
byte_limit |
Integer, applies the given byte limit to the statement's
result size. Byte counts are based on internal data representations and
might not match the final size in the requested format. If the result was
truncated due to the byte limit, then |
disposition |
One of |
format |
One of |
wait_timeout |
String, default is When set between If the statement takes longer to execute, |
on_wait_timeout |
One of When set to When set to |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Refer to the web documentation for detailed material on interaction of the various parameters and general recommendations
See Also
Other SQL Execution APIs:
db_sql_exec_cancel()
,
db_sql_exec_result()
,
db_sql_exec_status()
Get SQL Query Results
Description
Get SQL Query Results
Usage
db_sql_exec_result(
statement_id,
chunk_index,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
statement_id |
String, query execution |
chunk_index |
Integer, chunk index to fetch result. Starts from |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
After the statement execution has SUCCEEDED
, this request can be used to
fetch any chunk by index.
Whereas the first chunk with chunk_index = 0
is typically fetched with
db_sql_exec_result()
or db_sql_exec_status()
, this request can be used
to fetch subsequent chunks
The response structure is identical to the nested result element described
in the db_sql_exec_result()
request, and similarly includes the
next_chunk_index
and next_chunk_internal_link
fields for simple
iteration through the result set.
Read more on Databricks API docs
See Also
Other SQL Execution APIs:
db_sql_exec_cancel()
,
db_sql_exec_query()
,
db_sql_exec_status()
Get SQL Query Status
Description
Get SQL Query Status
Usage
db_sql_exec_status(
statement_id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
statement_id |
String, query execution |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
This request can be used to poll for the statement's status.
When the status.state
field is SUCCEEDED
it will also return the result
manifest and the first chunk of the result data.
When the statement is in the terminal states CANCELED
, CLOSED
or
FAILED
, it returns HTTP 200
with the state set.
After at least 12 hours in terminal state, the statement is removed from the
warehouse and further calls will receive an HTTP 404
response.
Read more on Databricks API docs
See Also
Other SQL Execution APIs:
db_sql_exec_cancel()
,
db_sql_exec_query()
,
db_sql_exec_result()
Get Global Warehouse Config
Description
Get Global Warehouse Config
Usage
db_sql_global_warehouse_get(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
Execute query with SQL Warehouse
Description
Execute query with SQL Warehouse
Usage
db_sql_query(
warehouse_id,
statement,
schema = NULL,
catalog = NULL,
parameters = NULL,
row_limit = NULL,
byte_limit = NULL,
return_arrow = FALSE,
max_active_connections = 30,
host = db_host(),
token = db_token()
)
Arguments
warehouse_id |
String, ID of warehouse upon which to execute a statement. |
statement |
String, the SQL statement to execute. The statement can
optionally be parameterized, see |
schema |
String, sets default schema for statement execution, similar
to |
catalog |
String, sets default catalog for statement execution, similar
to |
parameters |
List of Named Lists, parameters to pass into a SQL statement containing parameter markers. A parameter consists of a name, a value, and optionally a type.
To represent a See docs for more details. |
row_limit |
Integer, applies the given row limit to the statement's
result set, but unlike the |
byte_limit |
Integer, applies the given byte limit to the statement's
result size. Byte counts are based on internal data representations and
might not match the final size in the requested format. If the result was
truncated due to the byte limit, then |
return_arrow |
Boolean, determine if result is tibble::tibble or arrow::Table. |
max_active_connections |
Integer to decide on concurrent downloads. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
Value
tibble::tibble or arrow::Table.
List Warehouse Query History
Description
For more details refer to the query history documentation.
This function elevates the sub-components of filter_by
parameter to the R
function directly.
Usage
db_sql_query_history(
statuses = NULL,
user_ids = NULL,
endpoint_ids = NULL,
start_time_ms = NULL,
end_time_ms = NULL,
max_results = 100,
page_token = NULL,
include_metrics = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
statuses |
Allows filtering by query status. Possible values are:
|
user_ids |
Allows filtering by user ID's. Multiple permitted. |
endpoint_ids |
Allows filtering by endpoint ID's. Multiple permitted. |
start_time_ms |
Integer, limit results to queries that started after this time. |
end_time_ms |
Integer, limit results to queries that started before this time. |
max_results |
Limit the number of results returned in one page. Default is 100. |
page_token |
Opaque token used to get the next page of results. Optional. |
include_metrics |
Whether to include metrics about query execution. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
By default the filter parameters statuses
, user_ids
, and endpoints_ids
are NULL
.
Create Warehouse
Description
Create Warehouse
Usage
db_sql_warehouse_create(
name,
cluster_size,
min_num_clusters = 1,
max_num_clusters = 1,
auto_stop_mins = 30,
tags = list(),
spot_instance_policy = c("COST_OPTIMIZED", "RELIABILITY_OPTIMIZED"),
enable_photon = TRUE,
warehouse_type = c("CLASSIC", "PRO"),
enable_serverless_compute = NULL,
disable_uc = FALSE,
channel = c("CHANNEL_NAME_CURRENT", "CHANNEL_NAME_PREVIEW"),
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of the SQL warehouse. Must be unique. |
cluster_size |
Size of the clusters allocated to the warehouse. One of
|
min_num_clusters |
Minimum number of clusters available when a SQL warehouse is running. The default is 1. |
max_num_clusters |
Maximum number of clusters available when a SQL warehouse is running. If multi-cluster load balancing is not enabled, this is limited to 1. |
auto_stop_mins |
Time in minutes until an idle SQL warehouse terminates
all clusters and stops. Defaults to 30. For Serverless SQL warehouses
( |
tags |
Named list that describes the warehouse. Databricks tags all warehouse resources with these tags. |
spot_instance_policy |
The spot policy to use for allocating instances to clusters. This field is not used if the SQL warehouse is a Serverless SQL warehouse. |
enable_photon |
Whether queries are executed on a native vectorized
engine that speeds up query execution. The default is |
warehouse_type |
Either "CLASSIC" (default), or "PRO" |
enable_serverless_compute |
Whether this SQL warehouse is a Serverless
warehouse. To use a Serverless SQL warehouse, you must enable Serverless SQL
warehouses for the workspace. If Serverless SQL warehouses are disabled for the
workspace, the default is |
disable_uc |
If |
channel |
Whether to use the current SQL warehouse compute version or the
preview version. Databricks does not recommend using preview versions for
production workloads. The default is |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
Delete Warehouse
Description
Delete Warehouse
Usage
db_sql_warehouse_delete(
id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
ID of the SQL warehouse. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
Edit Warehouse
Description
Edit Warehouse
Usage
db_sql_warehouse_edit(
id,
name = NULL,
cluster_size = NULL,
min_num_clusters = NULL,
max_num_clusters = NULL,
auto_stop_mins = NULL,
tags = NULL,
spot_instance_policy = NULL,
enable_photon = NULL,
warehouse_type = NULL,
enable_serverless_compute = NULL,
channel = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
ID of the SQL warehouse. |
name |
Name of the SQL warehouse. Must be unique. |
cluster_size |
Size of the clusters allocated to the warehouse. One of
|
min_num_clusters |
Minimum number of clusters available when a SQL warehouse is running. The default is 1. |
max_num_clusters |
Maximum number of clusters available when a SQL warehouse is running. If multi-cluster load balancing is not enabled, this is limited to 1. |
auto_stop_mins |
Time in minutes until an idle SQL warehouse terminates
all clusters and stops. Defaults to 30. For Serverless SQL warehouses
( |
tags |
Named list that describes the warehouse. Databricks tags all warehouse resources with these tags. |
spot_instance_policy |
The spot policy to use for allocating instances to clusters. This field is not used if the SQL warehouse is a Serverless SQL warehouse. |
enable_photon |
Whether queries are executed on a native vectorized
engine that speeds up query execution. The default is |
warehouse_type |
Either "CLASSIC" (default), or "PRO" |
enable_serverless_compute |
Whether this SQL warehouse is a Serverless
warehouse. To use a Serverless SQL warehouse, you must enable Serverless SQL
warehouses for the workspace. If Serverless SQL warehouses are disabled for the
workspace, the default is |
channel |
Whether to use the current SQL warehouse compute version or the
preview version. Databricks does not recommend using preview versions for
production workloads. The default is |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Modify a SQL warehouse. All fields are optional. Missing fields default to the current values.
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
Get Warehouse
Description
Get Warehouse
Usage
db_sql_warehouse_get(
id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
ID of the SQL warehouse. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
List Warehouses
Description
List Warehouses
Usage
db_sql_warehouse_list(
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
Start Warehouse
Description
Start Warehouse
Usage
db_sql_warehouse_start(
id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
ID of the SQL warehouse. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_stop()
,
get_and_start_warehouse()
Stop Warehouse
Description
Stop Warehouse
Usage
db_sql_warehouse_stop(
id,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
id |
ID of the SQL warehouse. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
get_and_start_warehouse()
Fetch Databricks Token
Description
The function will check for a token in the DATABRICKS_HOST
environment variable.
.databrickscfg
will be searched if db_profile
and use_databrickscfg
are set or
if Posit Workbench managed OAuth credentials are detected.
If none of the above are found then will default to using OAuth U2M flow.
Refer to api authentication docs
Usage
db_token(profile = default_config_profile())
Arguments
profile |
Profile to use when fetching from environment variable
(e.g. |
Details
The behaviour is subject to change depending if db_profile
and
use_databrickscfg
options are set.
-
use_databrickscfg
: Boolean (default:FALSE
), determines if credentials are fetched from profile of.databrickscfg
or.Renviron
-
db_profile
: String (default:NULL
), determines profile used..databrickscfg
will automatically be used when Posit Workbench managed OAuth credentials are detected.
See vignette on authentication for more details.
Value
databricks token
See Also
Other Databricks Authentication Helpers:
db_host()
,
db_read_netrc()
,
db_wsid()
Get Catalog (Unity Catalog)
Description
Get Catalog (Unity Catalog)
Usage
db_uc_catalogs_get(
catalog,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
The name of the catalog. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Management:
db_uc_catalogs_list()
,
db_uc_schemas_get()
,
db_uc_schemas_list()
List Catalogs (Unity Catalog)
Description
List Catalogs (Unity Catalog)
Usage
db_uc_catalogs_list(
max_results = 1000,
include_browse = TRUE,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
max_results |
Maximum number of catalogs to return (default: 1000). |
include_browse |
Whether to include catalogs in the response for which the principal can only access selective metadata for. |
page_token |
Opaque token used to get the next page of results. Optional. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Management:
db_uc_catalogs_get()
,
db_uc_schemas_get()
,
db_uc_schemas_list()
Get Schema (Unity Catalog)
Description
Get Schema (Unity Catalog)
Usage
db_uc_schemas_get(
catalog,
schema,
include_browse = TRUE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog for schema of interest. |
schema |
Schema of interest. |
include_browse |
Whether to include catalogs in the response for which the principal can only access selective metadata for. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Management:
db_uc_catalogs_get()
,
db_uc_catalogs_list()
,
db_uc_schemas_list()
List Schemas (Unity Catalog)
Description
List Schemas (Unity Catalog)
Usage
db_uc_schemas_list(
catalog,
max_results = 1000,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog for schemas of interest. |
max_results |
Maximum number of schemas to return (default: 1000). |
page_token |
Opaque token used to get the next page of results. Optional. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Management:
db_uc_catalogs_get()
,
db_uc_catalogs_list()
,
db_uc_schemas_get()
Delete Table (Unity Catalog)
Description
Delete Table (Unity Catalog)
Usage
db_uc_tables_delete(
catalog,
schema,
table,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of table. |
schema |
Parent schema of table. |
table |
Table name. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
Boolean
See Also
Other Unity Catalog Table Management:
db_uc_tables_exists()
,
db_uc_tables_get()
,
db_uc_tables_list()
,
db_uc_tables_summaries()
Check Table Exists (Unity Catalog)
Description
Check Table Exists (Unity Catalog)
Usage
db_uc_tables_exists(
catalog,
schema,
table,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of table. |
schema |
Parent schema of table. |
table |
Table name. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List with fields table_exists
and supports_foreign_metadata_update
See Also
Other Unity Catalog Table Management:
db_uc_tables_delete()
,
db_uc_tables_get()
,
db_uc_tables_list()
,
db_uc_tables_summaries()
Get Table (Unity Catalog)
Description
Get Table (Unity Catalog)
Usage
db_uc_tables_get(
catalog,
schema,
table,
omit_columns = TRUE,
omit_properties = TRUE,
omit_username = TRUE,
include_browse = TRUE,
include_delta_metadata = TRUE,
include_manifest_capabilities = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of table. |
schema |
Parent schema of table. |
table |
Table name. |
omit_columns |
Whether to omit the columns of the table from the response or not. |
omit_properties |
Whether to omit the properties of the table from the response or not. |
omit_username |
Whether to omit the username of the table (e.g. owner, updated_by, created_by) from the response or not. |
include_browse |
Whether to include tables in the response for which the principal can only access selective metadata for. |
include_delta_metadata |
Whether delta metadata should be included in the response. |
include_manifest_capabilities |
Whether to include a manifest containing capabilities the table has. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Table Management:
db_uc_tables_delete()
,
db_uc_tables_exists()
,
db_uc_tables_list()
,
db_uc_tables_summaries()
List Tables (Unity Catalog)
Description
List Tables (Unity Catalog)
Usage
db_uc_tables_list(
catalog,
schema,
max_results = 50,
omit_columns = TRUE,
omit_properties = TRUE,
omit_username = TRUE,
include_browse = TRUE,
include_delta_metadata = FALSE,
include_manifest_capabilities = FALSE,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Name of parent catalog for tables of interest. |
schema |
Parent schema of tables. |
max_results |
Maximum number of tables to return (default: 50, max: 50). |
omit_columns |
Whether to omit the columns of the table from the response or not. |
omit_properties |
Whether to omit the properties of the table from the response or not. |
omit_username |
Whether to omit the username of the table (e.g. owner, updated_by, created_by) from the response or not. |
include_browse |
Whether to include tables in the response for which the principal can only access selective metadata for. |
include_delta_metadata |
Whether delta metadata should be included in the response. |
include_manifest_capabilities |
Whether to include a manifest containing capabilities the table has. |
page_token |
Opaque token used to get the next page of results. Optional. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Table Management:
db_uc_tables_delete()
,
db_uc_tables_exists()
,
db_uc_tables_get()
,
db_uc_tables_summaries()
List Table Summaries (Unity Catalog)
Description
List Table Summaries (Unity Catalog)
Usage
db_uc_tables_summaries(
catalog,
schema_name_pattern = NULL,
table_name_pattern = NULL,
max_results = 10000,
include_manifest_capabilities = FALSE,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Name of parent catalog for tables of interest. |
schema_name_pattern |
A sql |
table_name_pattern |
A sql |
max_results |
Maximum number of summaries for tables to return (default: 10000, max: 10000). If not set, the page length is set to a server configured value. |
include_manifest_capabilities |
Whether to include a manifest containing capabilities the table has. |
page_token |
Opaque token used to get the next page of results. Optional. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Table Management:
db_uc_tables_delete()
,
db_uc_tables_exists()
,
db_uc_tables_get()
,
db_uc_tables_list()
Update Volume (Unity Catalog)
Description
Update Volume (Unity Catalog)
Usage
db_uc_volumes_create(
catalog,
schema,
volume,
volume_type = c("MANAGED", "EXTERNAL"),
storage_location = NULL,
comment = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of volume |
schema |
Parent schema of volume |
volume |
Volume name. |
volume_type |
Either |
storage_location |
The storage location on the cloud, only specified
when |
comment |
The comment attached to the volume. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Volume Management:
db_uc_volumes_delete()
,
db_uc_volumes_get()
,
db_uc_volumes_list()
,
db_uc_volumes_update()
Delete Volume (Unity Catalog)
Description
Delete Volume (Unity Catalog)
Usage
db_uc_volumes_delete(
catalog,
schema,
volume,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of volume |
schema |
Parent schema of volume |
volume |
Volume name. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
Boolean
See Also
Other Unity Catalog Volume Management:
db_uc_volumes_create()
,
db_uc_volumes_get()
,
db_uc_volumes_list()
,
db_uc_volumes_update()
Get Volume (Unity Catalog)
Description
Get Volume (Unity Catalog)
Usage
db_uc_volumes_get(
catalog,
schema,
volume,
include_browse = TRUE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of volume |
schema |
Parent schema of volume |
volume |
Volume name. |
include_browse |
Whether to include volumes in the response for which the principal can only access selective metadata for. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Volume Management:
db_uc_volumes_create()
,
db_uc_volumes_delete()
,
db_uc_volumes_list()
,
db_uc_volumes_update()
List Volumes (Unity Catalog)
Description
List Volumes (Unity Catalog)
Usage
db_uc_volumes_list(
catalog,
schema,
max_results = 10000,
include_browse = TRUE,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of volume |
schema |
Parent schema of volume |
max_results |
Maximum number of volumes to return (default: 10000). |
include_browse |
Whether to include volumes in the response for which the principal can only access selective metadata for. |
page_token |
Opaque token used to get the next page of results. Optional. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Volume Management:
db_uc_volumes_create()
,
db_uc_volumes_delete()
,
db_uc_volumes_get()
,
db_uc_volumes_update()
Update Volume (Unity Catalog)
Description
Update Volume (Unity Catalog)
Usage
db_uc_volumes_update(
catalog,
schema,
volume,
owner = NULL,
comment = NULL,
new_name = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
catalog |
Parent catalog of volume |
schema |
Parent schema of volume |
volume |
Volume name. |
owner |
The identifier of the user who owns the volume (Optional). |
comment |
The comment attached to the volume (Optional). |
new_name |
New name for the volume (Optional). |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Value
List
See Also
Other Unity Catalog Volume Management:
db_uc_volumes_create()
,
db_uc_volumes_delete()
,
db_uc_volumes_get()
,
db_uc_volumes_list()
Volume FileSystem Delete
Description
Volume FileSystem Delete
Usage
db_volume_delete(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_dir_create()
,
db_volume_dir_delete()
,
db_volume_dir_exists()
,
db_volume_file_exists()
,
db_volume_list()
,
db_volume_read()
,
db_volume_write()
Volume FileSystem Create Directory
Description
Volume FileSystem Create Directory
Usage
db_volume_dir_create(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_delete()
,
db_volume_dir_exists()
,
db_volume_file_exists()
,
db_volume_list()
,
db_volume_read()
,
db_volume_write()
Volume FileSystem Delete Directory
Description
Volume FileSystem Delete Directory
Usage
db_volume_dir_delete(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_create()
,
db_volume_dir_exists()
,
db_volume_file_exists()
,
db_volume_list()
,
db_volume_read()
,
db_volume_write()
Volume FileSystem Check Directory Exists
Description
Volume FileSystem Check Directory Exists
Usage
db_volume_dir_exists(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_create()
,
db_volume_dir_delete()
,
db_volume_file_exists()
,
db_volume_list()
,
db_volume_read()
,
db_volume_write()
Volume FileSystem File Status
Description
Volume FileSystem File Status
Usage
db_volume_file_exists(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_create()
,
db_volume_dir_delete()
,
db_volume_dir_exists()
,
db_volume_list()
,
db_volume_read()
,
db_volume_write()
Volume FileSystem List Directory Contents
Description
Volume FileSystem List Directory Contents
Usage
db_volume_list(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_create()
,
db_volume_dir_delete()
,
db_volume_dir_exists()
,
db_volume_file_exists()
,
db_volume_read()
,
db_volume_write()
Volume FileSystem Read
Description
Return the contents of a file within a volume (up to 5GiB).
Usage
db_volume_read(
path,
destination,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
destination |
Path to write downloaded file to. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_create()
,
db_volume_dir_delete()
,
db_volume_dir_exists()
,
db_volume_file_exists()
,
db_volume_list()
,
db_volume_write()
Volume FileSystem Write
Description
Upload a file to volume filesystem.
Usage
db_volume_write(
path,
file = NULL,
overwrite = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the file in the Files API, omitting the initial slash. |
file |
Path to a file on local system, takes precedent over |
overwrite |
Flag (Default: |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Uploads a file of up to 5 GiB.
See Also
Other Volumes FileSystem API:
db_volume_delete()
,
db_volume_dir_create()
,
db_volume_dir_delete()
,
db_volume_dir_exists()
,
db_volume_file_exists()
,
db_volume_list()
,
db_volume_read()
Create a Vector Search Endpoint
Description
Create a Vector Search Endpoint
Usage
db_vs_endpoints_create(
name,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of vector search endpoint |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
This function can take a few moments to run.
See Also
Other Vector Search API:
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Delete a Vector Search Endpoint
Description
Delete a Vector Search Endpoint
Usage
db_vs_endpoints_delete(
endpoint,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
endpoint |
Name of vector search endpoint |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Get a Vector Search Endpoint
Description
Get a Vector Search Endpoint
Usage
db_vs_endpoints_get(
endpoint,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
endpoint |
Name of vector search endpoint |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
List Vector Search Endpoints
Description
List Vector Search Endpoints
Usage
db_vs_endpoints_list(
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
page_token |
Token for pagination |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Create a Vector Search Index
Description
Create a Vector Search Index
Usage
db_vs_indexes_create(
name,
endpoint,
primary_key,
spec,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
name |
Name of vector search index |
endpoint |
Name of vector search endpoint |
primary_key |
Vector search primary key column name |
spec |
Either |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Delete a Vector Search Index
Description
Delete a Vector Search Index
Usage
db_vs_indexes_delete(
index,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Delete Data from a Vector Search Index
Description
Delete Data from a Vector Search Index
Usage
db_vs_indexes_delete_data(
index,
primary_keys,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
primary_keys |
primary keys to be deleted from index |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Get a Vector Search Index
Description
Get a Vector Search Index
Usage
db_vs_indexes_get(
index,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
List Vector Search Indexes
Description
List Vector Search Indexes
Usage
db_vs_indexes_list(
endpoint,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
endpoint |
Name of vector search endpoint |
page_token |
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Query a Vector Search Index
Description
Query a Vector Search Index
Usage
db_vs_indexes_query(
index,
columns,
filters_json,
query_vector = NULL,
query_text = NULL,
score_threshold = 0,
query_type = c("ANN", "HYBRID"),
num_results = 10,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
columns |
Column names to include in response |
filters_json |
JSON string representing query filters, see details. |
query_vector |
Numeric vector. Required for direct vector access index and delta sync index using self managed vectors. |
query_text |
Required for delta sync index using model endpoint. |
score_threshold |
Numeric score threshold for the approximate nearest neighbour (ANN) search. Defaults to 0.0. |
query_type |
One of |
num_results |
Number of returns to return (default: 10). |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
You cannot specify both query_vector
and query_text
at the same time.
filter_jsons
examples:
-
'{"id <": 5}'
: Filter for id less than 5 -
'{"id >": 5}'
: Filter for id greater than 5 -
'{"id <=": 5}'
: Filter for id less than equal to 5 -
'{"id >=": 5}'
: Filter for id greater than equal to 5 -
'{"id": 5}'
: Filter for id equal to 5 -
'{"id": 5, "age >=": 18}'
: Filter for id equal to 5 and age greater than equal to 18
filter_jsons
will convert attempt to use jsonlite::toJSON
on any
non character vectors.
Refer to docs for Vector Search.
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Examples
## Not run:
db_vs_indexes_sync(
index = "myindex",
columns = c("id", "text"),
query_vector = c(1, 2, 3)
)
## End(Not run)
Query Vector Search Next Page
Description
Query Vector Search Next Page
Usage
db_vs_indexes_query_next_page(
index,
endpoint,
page_token = NULL,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
endpoint |
Name of vector search endpoint |
page_token |
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Scan a Vector Search Index
Description
Scan a Vector Search Index
Usage
db_vs_indexes_scan(
endpoint,
index,
last_primary_key,
num_results = 10,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
endpoint |
Name of vector search endpoint to scan |
index |
Name of vector search index to scan |
last_primary_key |
Primary key of the last entry returned in previous scan |
num_results |
Number of returns to return (default: 10) |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Scan the specified vector index and return the first num_results
entries
after the exclusive primary_key
.
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Synchronize a Vector Search Index
Description
Synchronize a Vector Search Index
Usage
db_vs_indexes_sync(
index,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Triggers a synchronization process for a specified vector index. The index must be a 'Delta Sync' index.
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Upsert Data into a Vector Search Index
Description
Upsert Data into a Vector Search Index
Usage
db_vs_indexes_upsert_data(
index,
df,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
index |
Name of vector search index |
df |
data.frame containing data to upsert |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Delete Object/Directory (Workspaces)
Description
Delete Object/Directory (Workspaces)
Usage
db_workspace_delete(
path,
recursive = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the notebook or directory. |
recursive |
Flag that specifies whether to delete the object
recursively. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Delete an object or a directory (and optionally recursively deletes all
objects in the directory). If path does not exist, this call returns an error
RESOURCE_DOES_NOT_EXIST
. If path is a non-empty directory and recursive is
set to false, this call returns an error DIRECTORY_NOT_EMPTY.
Object deletion cannot be undone and deleting a directory recursively is not atomic.
See Also
Other Workspace API:
db_workspace_export()
,
db_workspace_get_status()
,
db_workspace_import()
,
db_workspace_list()
,
db_workspace_mkdirs()
Export Notebook or Directory (Workspaces)
Description
Export Notebook or Directory (Workspaces)
Usage
db_workspace_export(
path,
format = c("AUTO", "SOURCE", "HTML", "JUPYTER", "DBC", "R_MARKDOWN"),
host = db_host(),
token = db_token(),
output_path = NULL,
direct_download = FALSE,
perform_request = TRUE
)
Arguments
path |
Absolute path of the notebook or directory. |
format |
One of |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
output_path |
Path to export file to, ensure to include correct suffix. |
direct_download |
Boolean (default: |
perform_request |
If |
Details
Export a notebook or contents of an entire directory. If path does not exist,
this call returns an error RESOURCE_DOES_NOT_EXIST.
You can export a directory only in DBC
format. If the exported data exceeds
the size limit, this call returns an error MAX_NOTEBOOK_SIZE_EXCEEDED.
This
API does not support exporting a library.
At this time we do not support the direct_download
parameter and returns a
base64 encoded string.
Value
base64 encoded string
See Also
Other Workspace API:
db_workspace_delete()
,
db_workspace_get_status()
,
db_workspace_import()
,
db_workspace_list()
,
db_workspace_mkdirs()
Get Object Status (Workspaces)
Description
Gets the status of an object or a directory.
Usage
db_workspace_get_status(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the notebook or directory. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
If path does not exist, this call returns an error RESOURCE_DOES_NOT_EXIST.
See Also
Other Workspace API:
db_workspace_delete()
,
db_workspace_export()
,
db_workspace_import()
,
db_workspace_list()
,
db_workspace_mkdirs()
Import Notebook/Directory (Workspaces)
Description
Import a notebook or the contents of an entire directory.
Usage
db_workspace_import(
path,
file = NULL,
content = NULL,
format = c("AUTO", "SOURCE", "HTML", "JUPYTER", "DBC", "R_MARKDOWN"),
language = NULL,
overwrite = FALSE,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the notebook or directory. |
file |
Path of local file to upload. See |
content |
Content to upload, this will be base64-encoded and has a limit of 10MB. |
format |
One of |
language |
One of |
overwrite |
Flag that specifies whether to overwrite existing object.
|
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
file
and content
are mutually exclusive. If both are specified content
will be ignored.
If path already exists and overwrite
is set to FALSE
, this call returns
an error RESOURCE_ALREADY_EXISTS.
You can use only DBC
format to import
a directory.
See Also
Other Workspace API:
db_workspace_delete()
,
db_workspace_export()
,
db_workspace_get_status()
,
db_workspace_list()
,
db_workspace_mkdirs()
List Directory Contents (Workspaces)
Description
List Directory Contents (Workspaces)
Usage
db_workspace_list(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the notebook or directory. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
List the contents of a directory, or the object if it is not a directory.
If the input path does not exist, this call returns an error
RESOURCE_DOES_NOT_EXIST.
See Also
Other Workspace API:
db_workspace_delete()
,
db_workspace_export()
,
db_workspace_get_status()
,
db_workspace_import()
,
db_workspace_mkdirs()
Make a Directory (Workspaces)
Description
Make a Directory (Workspaces)
Usage
db_workspace_mkdirs(
path,
host = db_host(),
token = db_token(),
perform_request = TRUE
)
Arguments
path |
Absolute path of the directory. |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
perform_request |
If |
Details
Create the given directory and necessary parent directories if they do not
exists. If there exists an object (not a directory) at any prefix of the
input path, this call returns an error RESOURCE_ALREADY_EXISTS.
If this
operation fails it may have succeeded in creating some of the necessary
parent directories.
See Also
Other Workspace API:
db_workspace_delete()
,
db_workspace_export()
,
db_workspace_get_status()
,
db_workspace_import()
,
db_workspace_list()
Fetch Databricks Workspace ID
Description
Workspace ID, optionally specified to make connections pane more powerful.
Specified as an environment variable DATABRICKS_WSID
.
.databrickscfg
will be searched if db_profile
and use_databrickscfg
are set or
if Posit Workbench managed OAuth credentials are detected.
Refer to api authentication docs
Usage
db_wsid(profile = default_config_profile())
Arguments
profile |
Profile to use when fetching from environment variable
(e.g. |
Details
The behaviour is subject to change depending if db_profile
and
use_databrickscfg
options are set.
-
use_databrickscfg
: Boolean (default:FALSE
), determines if credentials are fetched from profile of.databrickscfg
or.Renviron
-
db_profile
: String (default:NULL
), determines profile used..databrickscfg
will automatically be used when Posit Workbench managed OAuth credentials are detected.
See vignette on authentication for more details.
Value
databricks workspace ID
See Also
Other Databricks Authentication Helpers:
db_host()
,
db_read_netrc()
,
db_token()
DBFS Storage Information
Description
DBFS Storage Information
Usage
dbfs_storage_info(destination)
Arguments
destination |
DBFS destination. Example: |
See Also
cluster_log_conf()
, init_script_info()
Other Cluster Log Configuration Objects:
cluster_log_conf()
,
s3_storage_info()
Other Init Script Info Objects:
file_storage_info()
,
s3_storage_info()
Returns the default config profile
Description
Returns the default config profile
Usage
default_config_profile()
Details
Returns the config profile first looking at DATABRICKS_CONFIG_PROFILE
and then the db_profile
option.
Value
profile name
Delta Sync Vector Search Index Specification
Description
Delta Sync Vector Search Index Specification
Usage
delta_sync_index_spec(
source_table,
embedding_writeback_table = NULL,
embedding_source_columns = NULL,
embedding_vector_columns = NULL,
pipeline_type = c("TRIGGERED", "CONTINUOUS")
)
Arguments
source_table |
The name of the source table. |
embedding_writeback_table |
Name of table to sync index contents and computed embeddings back to delta table, see details. |
embedding_source_columns |
The columns that contain the embedding
source, must be one or list of |
embedding_vector_columns |
The columns that contain the embedding, must
be one or list of |
pipeline_type |
Pipeline execution mode, see details. |
Details
pipeline_type
is either:
-
"TRIGGERED"
: If the pipeline uses the triggered execution mode, the system stops processing after successfully refreshing the source table in the pipeline once, ensuring the table is updated based on the data available when the update started. -
"CONTINUOUS"
If the pipeline uses continuous execution, the pipeline processes new data as it arrives in the source table to keep vector index fresh.
The only supported naming convention for embedding_writeback_table
is
"<index_name>_writeback_table"
.
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
direct_access_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Determine brickster virtualenv
Description
Determine brickster virtualenv
Usage
determine_brickster_venv()
Details
Returns NULL
when running within Databricks,
otherwise "r-brickster"
Delta Sync Vector Search Index Specification
Description
Delta Sync Vector Search Index Specification
Usage
direct_access_index_spec(
embedding_source_columns = NULL,
embedding_vector_columns = NULL,
schema
)
Arguments
embedding_source_columns |
The columns that contain the embedding
source, must be one or list of |
embedding_vector_columns |
The columns that contain the embedding, must
be one or list of |
schema |
Named list, names are column names, values are types. See details. |
Details
The supported types are:
-
"integer"
-
"long"
-
"float"
-
"double"
-
"boolean"
-
"string"
-
"date"
-
"timestamp"
-
"array<float>"
: supported for vector columns -
"array<double>"
: supported for vector columns
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
embedding_source_column()
,
embedding_vector_column()
Docker Image
Description
Docker image connection information.
Usage
docker_image(url, username, password)
Arguments
url |
URL for the Docker image. |
username |
User name for the Docker repository. |
password |
Password for the Docker repository. |
Details
Uses basic authentication, strongly recommended that credentials are not stored in any scripts and environment variables should be used.
See Also
db_cluster_create()
, db_cluster_edit()
Email Notifications
Description
Email Notifications
Usage
email_notifications(
on_start = NULL,
on_success = NULL,
on_failure = NULL,
no_alert_for_skipped_runs = TRUE
)
Arguments
on_start |
List of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |
on_success |
List of email addresses to be notified when a run
successfully completes. A run is considered to have completed successfully if
it ends with a |
on_failure |
List of email addresses to be notified when a run
unsuccessfully completes. A run is considered to have completed
unsuccessfully if it ends with an |
no_alert_for_skipped_runs |
If |
See Also
Other Task Objects:
condition_task()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Embedding Source Column
Description
Embedding Source Column
Usage
embedding_source_column(name, model_endpoint_name)
Arguments
name |
Name of the column |
model_endpoint_name |
Name of the embedding model endpoint |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_vector_column()
Embedding Vector Column
Description
Embedding Vector Column
Usage
embedding_vector_column(name, dimension)
Arguments
name |
Name of the column |
dimension |
dimension of the embedding vector |
See Also
Other Vector Search API:
db_vs_endpoints_create()
,
db_vs_endpoints_delete()
,
db_vs_endpoints_get()
,
db_vs_endpoints_list()
,
db_vs_indexes_create()
,
db_vs_indexes_delete()
,
db_vs_indexes_delete_data()
,
db_vs_indexes_get()
,
db_vs_indexes_list()
,
db_vs_indexes_query()
,
db_vs_indexes_query_next_page()
,
db_vs_indexes_scan()
,
db_vs_indexes_sync()
,
db_vs_indexes_upsert_data()
,
delta_sync_index_spec()
,
direct_access_index_spec()
,
embedding_source_column()
File Storage Information
Description
File Storage Information
Usage
file_storage_info(destination)
Arguments
destination |
File destination. Example: |
Details
The file storage type is only available for clusters set up using Databricks Container Services.
See Also
Other Init Script Info Objects:
dbfs_storage_info()
,
s3_storage_info()
For Each Task
Description
For Each Task
Usage
for_each_task(inputs, task, concurrency = 1)
Arguments
inputs |
Array for task to iterate on. This can be a JSON string or a reference to an array parameter. |
task |
Must be a |
concurrency |
Maximum allowed number of concurrent runs of the task. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
GCP Attributes
Description
GCP Attributes
Usage
gcp_attributes(use_preemptible_executors = TRUE, google_service_account = NULL)
Arguments
use_preemptible_executors |
Boolean (Default: |
google_service_account |
Google service account email address that the cluster uses to authenticate with Google Identity. This field is used for authentication with the GCS and BigQuery data sources. |
Details
For use with GCS and BigQuery, your Google service account that you use to access the data source must be in the same project as the SA that you specified when setting up your Databricks account.
See Also
db_cluster_create()
, db_cluster_edit()
Other Cloud Attributes:
aws_attributes()
,
azure_attributes()
Get and Start Cluster
Description
Get and Start Cluster
Usage
get_and_start_cluster(
cluster_id,
polling_interval = 5,
host = db_host(),
token = db_token(),
silent = FALSE
)
Arguments
cluster_id |
Canonical identifier for the cluster. |
polling_interval |
Number of seconds to wait between status checks |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
silent |
Boolean (default: |
Details
Get information regarding a Databricks cluster. If the cluster is inactive it will be started and wait until the cluster is active.
Value
db_cluster_get()
See Also
db_cluster_get()
and db_cluster_start()
.
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_latest_dbr()
Other Cluster Helpers:
get_latest_dbr()
Get and Start Warehouse
Description
Get and Start Warehouse
Usage
get_and_start_warehouse(
id,
polling_interval = 5,
host = db_host(),
token = db_token()
)
Arguments
id |
ID of the SQL warehouse. |
polling_interval |
Number of seconds to wait between status checks |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
Details
Get information regarding a Databricks cluster. If the cluster is inactive it will be started and wait until the cluster is active.
Value
db_sql_warehouse_get()
See Also
db_sql_warehouse_get()
and db_sql_warehouse_start()
.
Other Warehouse API:
db_sql_global_warehouse_get()
,
db_sql_warehouse_create()
,
db_sql_warehouse_delete()
,
db_sql_warehouse_edit()
,
db_sql_warehouse_get()
,
db_sql_warehouse_list()
,
db_sql_warehouse_start()
,
db_sql_warehouse_stop()
Get Latest Databricks Runtime (DBR)
Description
Get Latest Databricks Runtime (DBR)
Usage
get_latest_dbr(lts, ml, gpu, photon, host = db_host(), token = db_token())
Arguments
lts |
Boolean, if |
ml |
Boolean, if |
gpu |
Boolean, if |
photon |
Boolean, if |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
Details
There are runtime combinations that are not possible, such as GPU/ML and photon. This function does not permit invalid combinations.
Value
Named list
See Also
Other Clusters API:
db_cluster_create()
,
db_cluster_edit()
,
db_cluster_events()
,
db_cluster_get()
,
db_cluster_list()
,
db_cluster_list_node_types()
,
db_cluster_list_zones()
,
db_cluster_perm_delete()
,
db_cluster_pin()
,
db_cluster_resize()
,
db_cluster_restart()
,
db_cluster_runtime_versions()
,
db_cluster_start()
,
db_cluster_terminate()
,
db_cluster_unpin()
,
get_and_start_cluster()
Other Cluster Helpers:
get_and_start_cluster()
Git Source for Job Notebook Tasks
Description
Git Source for Job Notebook Tasks
Usage
git_source(
git_url,
git_provider,
reference,
type = c("branch", "tag", "commit")
)
Arguments
git_url |
URL of the repository to be cloned by this job. The maximum length is 300 characters. |
git_provider |
Unique identifier of the service used to host the Git
repository. Must be one of: |
reference |
Branch, tag, or commit to be checked out and used by this job. |
type |
Type of reference being used, one of: |
Detect if running within Databricks Notebook
Description
Detect if running within Databricks Notebook
Usage
in_databricks_nb()
Details
R sessions on Databricks can be detected via various environment variables and directories.
Value
Boolean
Init Script Info
Description
Init Script Info
Usage
init_script_info(...)
Arguments
... |
Accepts multiple instances |
Details
file_storage_info()
is only available for clusters set up using Databricks
Container Services.
For instructions on using init scripts with Databricks Container Services, see Use an init script.
See Also
db_cluster_create()
, db_cluster_edit()
Test if object is of class AccessControlRequestForGroup
Description
Test if object is of class AccessControlRequestForGroup
Usage
is.access_control_req_group(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the AccessControlRequestForGroup
class.
Test if object is of class AccessControlRequestForUser
Description
Test if object is of class AccessControlRequestForUser
Usage
is.access_control_req_user(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the AccessControlRequestForUser
class.
Test if object is of class AccessControlRequest
Description
Test if object is of class AccessControlRequest
Usage
is.access_control_request(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the AccessControlRequest
class.
Test if object is of class AwsAttributes
Description
Test if object is of class AwsAttributes
Usage
is.aws_attributes(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the AwsAttributes
class.
Test if object is of class AzureAttributes
Description
Test if object is of class AzureAttributes
Usage
is.azure_attributes(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the AzureAttributes
class.
Test if object is of class AutoScale
Description
Test if object is of class AutoScale
Usage
is.cluster_autoscale(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the AutoScale
class.
Test if object is of class ClusterLogConf
Description
Test if object is of class ClusterLogConf
Usage
is.cluster_log_conf(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the ClusterLogConf
class.
Test if object is of class ConditionTask
Description
Test if object is of class ConditionTask
Usage
is.condition_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the ConditionTask
class.
Test if object is of class CronSchedule
Description
Test if object is of class CronSchedule
Usage
is.cron_schedule(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the CronSchedule
class.
Test if object is of class DbfsStorageInfo
Description
Test if object is of class DbfsStorageInfo
Usage
is.dbfs_storage_info(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the DbfsStorageInfo
class.
Test if object is of class DeltaSyncIndex
Description
Test if object is of class DeltaSyncIndex
Usage
is.delta_sync_index(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the DeltaSyncIndex
class.
Test if object is of class DirectAccessIndex
Description
Test if object is of class DirectAccessIndex
Usage
is.direct_access_index(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the DirectAccessIndex
class.
Test if object is of class DockerImage
Description
Test if object is of class DockerImage
Usage
is.docker_image(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the DockerImage
class.
Test if object is of class JobEmailNotifications
Description
Test if object is of class JobEmailNotifications
Usage
is.email_notifications(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the JobEmailNotifications
class.
Test if object is of class EmbeddingSourceColumn
Description
Test if object is of class EmbeddingSourceColumn
Usage
is.embedding_source_column(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the EmbeddingSourceColumn
class.
Test if object is of class EmbeddingVectorColumn
Description
Test if object is of class EmbeddingVectorColumn
Usage
is.embedding_vector_column(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the EmbeddingVectorColumn
class.
Test if object is of class FileStorageInfo
Description
Test if object is of class FileStorageInfo
Usage
is.file_storage_info(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the FileStorageInfo
class.
Test if object is of class ForEachTask
Description
Test if object is of class ForEachTask
Usage
is.for_each_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the ForEachTask
class.
Test if object is of class GcpAttributes
Description
Test if object is of class GcpAttributes
Usage
is.gcp_attributes(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the GcpAttributes
class.
Test if object is of class GitSource
Description
Test if object is of class GitSource
Usage
is.git_source(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the GitSource
class.
Test if object is of class InitScriptInfo
Description
Test if object is of class InitScriptInfo
Usage
is.init_script_info(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the InitScriptInfo
class.
Test if object is of class JobTaskSettings
Description
Test if object is of class JobTaskSettings
Usage
is.job_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the JobTaskSettings
class.
Test if object is of class CranLibrary
Description
Test if object is of class CranLibrary
Usage
is.lib_cran(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the CranLibrary
class.
Test if object is of class EggLibrary
Description
Test if object is of class EggLibrary
Usage
is.lib_egg(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the EggLibrary
class.
Test if object is of class JarLibrary
Description
Test if object is of class JarLibrary
Usage
is.lib_jar(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the JarLibrary
class.
Test if object is of class MavenLibrary
Description
Test if object is of class MavenLibrary
Usage
is.lib_maven(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the MavenLibrary
class.
Test if object is of class PyPiLibrary
Description
Test if object is of class PyPiLibrary
Usage
is.lib_pypi(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the PyPiLibrary
class.
Test if object is of class WhlLibrary
Description
Test if object is of class WhlLibrary
Usage
is.lib_whl(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the WhlLibrary
class.
Test if object is of class Libraries
Description
Test if object is of class Libraries
Usage
is.libraries(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the Libraries
class.
Test if object is of class Library
Description
Test if object is of class Library
Usage
is.library(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the Library
class.
Test if object is of class NewCluster
Description
Test if object is of class NewCluster
Usage
is.new_cluster(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the NewCluster
class.
Test if object is of class NotebookTask
Description
Test if object is of class NotebookTask
Usage
is.notebook_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the NotebookTask
class.
Test if object is of class PipelineTask
Description
Test if object is of class PipelineTask
Usage
is.pipeline_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the PipelineTask
class.
Test if object is of class PythonWheelTask
Description
Test if object is of class PythonWheelTask
Usage
is.python_wheel_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the PythonWheelTask
class.
Test if object is of class RunJobTask
Description
Test if object is of class RunJobTask
Usage
is.run_job_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the RunJobTask
class.
Test if object is of class S3StorageInfo
Description
Test if object is of class S3StorageInfo
Usage
is.s3_storage_info(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the S3StorageInfo
class.
Test if object is of class SparkJarTask
Description
Test if object is of class SparkJarTask
Usage
is.spark_jar_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the SparkJarTask
class.
Test if object is of class SparkPythonTask
Description
Test if object is of class SparkPythonTask
Usage
is.spark_python_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the SparkPythonTask
class.
Test if object is of class SparkSubmitTask
Description
Test if object is of class SparkSubmitTask
Usage
is.spark_submit_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the SparkSubmitTask
class.
Test if object is of class SqlFileTask
Description
Test if object is of class SqlFileTask
Usage
is.sql_file_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the SqlFileTask
class.
Test if object is of class SqlQueryTask
Description
Test if object is of class SqlQueryTask
Usage
is.sql_query_task(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the SqlQueryTask
class.
Test if object is of class JobTask
Description
Test if object is of class JobTask
Usage
is.valid_task_type(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the JobTask
class.
Test if object is of class VectorSearchIndexSpec
Description
Test if object is of class VectorSearchIndexSpec
Usage
is.vector_search_index_spec(x)
Arguments
x |
An object |
Value
TRUE
if the object inherits from the VectorSearchIndexSpec
class.
Job Task
Description
Job Task
Usage
job_task(
task_key,
description = NULL,
depends_on = c(),
existing_cluster_id = NULL,
new_cluster = NULL,
job_cluster_key = NULL,
task,
libraries = NULL,
email_notifications = NULL,
timeout_seconds = NULL,
max_retries = 0,
min_retry_interval_millis = 0,
retry_on_timeout = FALSE,
run_if = c("ALL_SUCCESS", "ALL_DONE", "NONE_FAILED", "AT_LEAST_ONE_SUCCESS",
"ALL_FAILED", "AT_LEAST_ONE_FAILED")
)
Arguments
task_key |
A unique name for the task. This field is used to refer to
this task from other tasks. This field is required and must be unique within
its parent job. On |
description |
An optional description for this task. The maximum length is 4096 bytes. |
depends_on |
Vector of |
existing_cluster_id |
ID of an existing cluster that is used for all runs of this task. |
new_cluster |
Instance of |
job_cluster_key |
Task is executed reusing the cluster specified in
|
task |
One of |
libraries |
Instance of |
email_notifications |
Instance of email_notifications. |
timeout_seconds |
An optional timeout applied to each run of this job task. The default behavior is to have no timeout. |
max_retries |
An optional maximum number of times to retry an
unsuccessful run. A run is considered to be unsuccessful if it completes with
the |
min_retry_interval_millis |
Optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried. |
retry_on_timeout |
Optional policy to specify whether to retry a task when it times out. The default behavior is to not retry on timeout. |
run_if |
The condition determining whether the task is run once its dependencies have been completed. |
Job Tasks
Description
Job Tasks
Usage
job_tasks(...)
Arguments
... |
Multiple Instance of tasks |
See Also
db_jobs_create()
, db_jobs_reset()
, db_jobs_update()
Cran Library (R)
Description
Cran Library (R)
Usage
lib_cran(package, repo = NULL)
Arguments
package |
The name of the CRAN package to install. |
repo |
The repository where the package can be found. If not specified, the default CRAN repo is used. |
See Also
Other Library Objects:
lib_egg()
,
lib_jar()
,
lib_maven()
,
lib_pypi()
,
lib_whl()
,
libraries()
Egg Library (Python)
Description
Egg Library (Python)
Usage
lib_egg(egg)
Arguments
egg |
URI of the egg to be installed. DBFS and S3 URIs are supported.
For example: |
See Also
Other Library Objects:
lib_cran()
,
lib_jar()
,
lib_maven()
,
lib_pypi()
,
lib_whl()
,
libraries()
Jar Library (Scala)
Description
Jar Library (Scala)
Usage
lib_jar(jar)
Arguments
jar |
URI of the JAR to be installed. DBFS and S3 URIs are supported.
For example: |
See Also
Other Library Objects:
lib_cran()
,
lib_egg()
,
lib_maven()
,
lib_pypi()
,
lib_whl()
,
libraries()
Maven Library (Scala)
Description
Maven Library (Scala)
Usage
lib_maven(coordinates, repo = NULL, exclusions = NULL)
Arguments
coordinates |
Gradle-style Maven coordinates. For example:
|
repo |
Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched. |
exclusions |
List of dependencies to exclude. For example:
|
See Also
Other Library Objects:
lib_cran()
,
lib_egg()
,
lib_jar()
,
lib_pypi()
,
lib_whl()
,
libraries()
PyPi Library (Python)
Description
PyPi Library (Python)
Usage
lib_pypi(package, repo = NULL)
Arguments
package |
The name of the PyPI package to install. An optional exact
version specification is also supported. Examples: |
repo |
The repository where the package can be found. If not specified, the default pip index is used. |
See Also
Other Library Objects:
lib_cran()
,
lib_egg()
,
lib_jar()
,
lib_maven()
,
lib_whl()
,
libraries()
Wheel Library (Python)
Description
Wheel Library (Python)
Usage
lib_whl(whl)
Arguments
whl |
URI of the wheel or zipped wheels to be installed.
DBFS and S3 URIs are supported. For example: |
See Also
Other Library Objects:
lib_cran()
,
lib_egg()
,
lib_jar()
,
lib_maven()
,
lib_pypi()
,
libraries()
Libraries
Description
Libraries
Usage
libraries(...)
Arguments
... |
Accepts multiple instances of |
Details
Optional list of libraries to be installed on the cluster that executes the task.
See Also
job_task()
, lib_jar()
, lib_cran()
, lib_maven()
,
lib_pypi()
, lib_whl()
, lib_egg()
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Other Library Objects:
lib_cran()
,
lib_egg()
,
lib_jar()
,
lib_maven()
,
lib_pypi()
,
lib_whl()
New Cluster
Description
New Cluster
Usage
new_cluster(
num_workers,
spark_version,
node_type_id,
driver_node_type_id = NULL,
autoscale = NULL,
cloud_attrs = NULL,
spark_conf = NULL,
spark_env_vars = NULL,
custom_tags = NULL,
ssh_public_keys = NULL,
log_conf = NULL,
init_scripts = NULL,
enable_elastic_disk = TRUE,
driver_instance_pool_id = NULL,
instance_pool_id = NULL,
kind = c("CLASSIC_PREVIEW"),
data_security_mode = c("NONE", "SINGLE_USER", "USER_ISOLATION", "LEGACY_TABLE_ACL",
"LEGACY_PASSTHROUGH", "LEGACY_SINGLE_USER", "LEGACY_SINGLE_USER_STANDARD",
"DATA_SECURITY_MODE_STANDARD", "DATA_SECURITY_MODE_DEDICATED",
"DATA_SECURITY_MODE_AUTO")
)
Arguments
num_workers |
Number of worker nodes that this cluster should have. A
cluster has one Spark driver and |
spark_version |
The runtime version of the cluster. You can retrieve a
list of available runtime versions by using |
node_type_id |
The node type for the worker nodes.
|
driver_node_type_id |
The node type of the Spark driver. This field is
optional; if unset, the driver node type will be set as the same value as
|
autoscale |
Instance of |
cloud_attrs |
Attributes related to clusters running on specific cloud
provider. Defaults to |
spark_conf |
Named list. An object containing a set of optional,
user-specified Spark configuration key-value pairs. You can also pass in a
string of extra JVM options to the driver and the executors via
|
spark_env_vars |
Named list. User-specified environment variable
key-value pairs. In order to specify an additional set of
|
custom_tags |
Named list. An object containing a set of tags for cluster
resources. Databricks tags all cluster resources with these tags in addition
to |
ssh_public_keys |
List. SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified. |
log_conf |
Instance of |
init_scripts |
Instance of |
enable_elastic_disk |
When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. |
driver_instance_pool_id |
ID of the instance pool to use for the
driver node. You must also specify |
instance_pool_id |
ID of the instance pool to use for cluster nodes. If
|
kind |
The kind of compute described by this compute specification. |
data_security_mode |
Data security mode decides what data governance model to use when accessing data from a cluster. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Notebook Task
Description
Notebook Task
Usage
notebook_task(notebook_path, base_parameters = NULL)
Arguments
notebook_path |
The absolute path of the notebook to be run in the Databricks workspace. This path must begin with a slash. |
base_parameters |
Named list of base parameters to be used for each run of this job. |
Details
If the run is initiated by a call to db_jobs_run_now()
with parameters
specified, the two parameters maps are merged. If the same key is specified
in base_parameters and in run-now, the value from run-now is used.
Use Task parameter variables to set parameters containing information about job runs.
If the notebook takes a parameter that is not specified in the job’s
base_parameters
or the run-now override parameters, the default value from
the notebook is used.
Retrieve these parameters in a notebook using dbutils.widgets.get
.
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Connect to Databricks Workspace
Description
Connect to Databricks Workspace
Usage
open_workspace(host = db_host(), token = db_token(), name = NULL)
Arguments
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
name |
Desired name to assign the connection |
Examples
## Not run:
open_workspace(host = db_host(), token = db_token, name = "MyWorkspace")
## End(Not run)
Pipeline Task
Description
Pipeline Task
Usage
pipeline_task(pipeline_id)
Arguments
pipeline_id |
The full name of the pipeline task to execute. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Python Wheel Task
Description
Python Wheel Task
Usage
python_wheel_task(package_name, entry_point = NULL, parameters = list())
Arguments
package_name |
Name of the package to execute. |
entry_point |
Named entry point to use, if it does not exist in the
metadata of the package it executes the function from the package directly
using |
parameters |
Command-line parameters passed to python wheel task. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Reads Databricks CLI Config
Description
Reads Databricks CLI Config
Usage
read_databrickscfg(key = c("token", "host", "wsid"), profile = NULL)
Arguments
key |
The value to fetch from profile. One of |
profile |
Character, the name of the profile to retrieve values |
Details
Reads .databrickscfg
file and retrieves the values associated to
a given profile. Brickster searches for the config file in the user's home directory by default.
To see where this is you can run Sys.getenv("HOME") on unix-like operating systems,
or, Sys.getenv("USERPROFILE") on windows.
An alternate location will be used if the environment variable DATABRICKS_CONFIG_FILE
is set.
Value
named list of values associated with profile
Reads Environment Variables
Description
Reads Environment Variables
Usage
read_env_var(key = c("token", "host", "wsid"), profile = NULL, error = TRUE)
Arguments
key |
The value to fetch from profile. One of |
profile |
Character, the name of the profile to retrieve values |
error |
Boolean, when key isn't found should error be raised |
Details
Fetches relevant environment variables based on profile
Value
named list of values associated with profile
Remove Library Path
Description
Remove Library Path
Usage
remove_lib_path(path, version = FALSE)
Arguments
path |
Directory to remove from |
version |
If |
See Also
base::.libPaths()
, remove_lib_path()
Run Job Task
Description
Run Job Task
Usage
run_job_task(job_id, job_parameters, full_refresh = FALSE)
Arguments
job_id |
ID of the job to trigger. |
job_parameters |
Named list, job-level parameters used to trigger job. |
full_refresh |
If the pipeline should perform a full refresh. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
S3 Storage Info
Description
S3 Storage Info
Usage
s3_storage_info(
destination,
region = NULL,
endpoint = NULL,
enable_encryption = FALSE,
encryption_type = c("sse-s3", "sse-kms"),
kms_key = NULL,
canned_acl = NULL
)
Arguments
destination |
S3 destination. For example: |
region |
S3 region. For example: |
endpoint |
S3 endpoint. For example:
|
enable_encryption |
Boolean (Default: |
encryption_type |
Encryption type, it could be |
kms_key |
KMS key used if encryption is enabled and encryption type is
set to |
canned_acl |
Set canned access control list. For example:
|
See Also
cluster_log_conf()
, init_script_info()
Other Cluster Log Configuration Objects:
cluster_log_conf()
,
dbfs_storage_info()
Other Init Script Info Objects:
dbfs_storage_info()
,
file_storage_info()
Spark Jar Task
Description
Spark Jar Task
Usage
spark_jar_task(main_class_name, parameters = list())
Arguments
main_class_name |
The full name of the class containing the main method
to be executed. This class must be contained in a JAR provided as a library.
The code must use |
parameters |
Named list. Parameters passed to the main method. Use Task parameter variables to set parameters containing information about job runs. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Spark Python Task
Description
Spark Python Task
Usage
spark_python_task(python_file, parameters = list())
Arguments
python_file |
The URI of the Python file to be executed. DBFS and S3 paths are supported. |
parameters |
List. Command line parameters passed to the Python file. Use Task parameter variables to set parameters containing information about job runs. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_submit_task()
,
sql_file_task()
,
sql_query_task()
Spark Submit Task
Description
Spark Submit Task
Usage
spark_submit_task(parameters)
Arguments
parameters |
List. Command-line parameters passed to spark submit. Use Task parameter variables to set parameters containing information about job runs. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
sql_file_task()
,
sql_query_task()
SQL File Task
Description
SQL File Task
Usage
sql_file_task(path, warehouse_id, source = NULL, parameters = NULL)
Arguments
path |
Path of the SQL file. Must be relative if the source is a remote Git repository and absolute for workspace paths. |
warehouse_id |
The canonical identifier of the SQL warehouse. |
source |
Optional location type of the SQL file. When set to |
parameters |
Named list of paramters to be used for each run of this job. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_query_task()
SQL Query Task
Description
SQL Query Task
Usage
sql_query_task(query_id, warehouse_id, parameters = NULL)
Arguments
query_id |
The canonical identifier of the SQL query. |
warehouse_id |
The canonical identifier of the SQL warehouse. |
parameters |
Named list of paramters to be used for each run of this job. |
See Also
Other Task Objects:
condition_task()
,
email_notifications()
,
for_each_task()
,
libraries()
,
new_cluster()
,
notebook_task()
,
pipeline_task()
,
python_wheel_task()
,
run_job_task()
,
spark_jar_task()
,
spark_python_task()
,
spark_submit_task()
,
sql_file_task()
Returns whether or not to use a .databrickscfg
file
Description
Returns whether or not to use a .databrickscfg
file
Usage
use_databricks_cfg()
Details
Indicates .databrickscfg
should be used instead of environment variables when
either the use_databrickscfg
option is set or Posit Workbench managed OAuth credentials are detected.
Value
boolean
Wait for Libraries to Install on Databricks Cluster
Description
Wait for Libraries to Install on Databricks Cluster
Usage
wait_for_lib_installs(
cluster_id,
polling_interval = 5,
allow_failures = FALSE,
host = db_host(),
token = db_token()
)
Arguments
cluster_id |
Unique identifier of a Databricks cluster. |
polling_interval |
Number of seconds to wait between status checks |
allow_failures |
If |
host |
Databricks workspace URL, defaults to calling |
token |
Databricks workspace token, defaults to calling |
Details
Library installs on Databricks clusters are asynchronous, this function allows you to repeatedly check installation status of each library.
Can be used to block any scripts until required dependencies are installed.