Type: | Package |
Title: | Descriptive Statistics 'OpenBudgets.eu' |
Version: | 1.3.2 |
Date: | 2020-05-04 |
Description: | Estimate and return the needed parameters for visualizations designed for 'OpenBudgets.eu' http://openbudgets.eu/ datasets. Calculate descriptive statistical measures in budget data of municipalities across Europe, according to the 'OpenBudgets.eu' data model. There are functions for measuring central tendency and dispersion of amount variables along with their distributions and correlations and the frequencies of categorical variables for a given dataset. Also, can be used generally to other datasets, to extract visualization parameters, convert them to 'JSON' format and use them as input in a different graphical interface. |
Maintainer: | Kleanthis Koupidis <koupidis@okfn.gr> |
URL: | https://github.com/okgreece/DescriptiveStats.OBeu |
BugReports: | https://github.com/okgreece/DescriptiveStats.OBeu/issues |
License: | GPL-2 | file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | dplyr, graphics, grDevices, jsonlite, magrittr, RCurl, reshape, stats |
RoxygenNote: | 7.1.0 |
Suggests: | curl, knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2020-05-03 21:46:01 UTC; kleanthis-okfngr |
Author: | Kleanthis Koupidis [aut, cre], Aikaterini Chatzopoulou [aut], Charalampos Bratsas [aut] |
Repository: | CRAN |
Date/Publication: | 2020-05-04 04:10:02 UTC |
Coefficient of variation
Description
Extract and return a data frame with the columns that include only numeric values
Usage
CV(x)
Arguments
x |
A numeric vector or matrix or dataframe |
Value
This function returns a vector with the coefficient of variance for the input vector,matrix or data frame.
Author(s)
Kleanthis Koupidis
Wuppertal Fiscal Data extracted from Open Spending API
Description
This dataset contains the budget of wuppertal for 2009 to 2020
The product ID
The account type
The kind
The year these amounts were measaured
The amount
The product area ID
The product group ID
The product
The product area
The product group
Format
A data frame with the previous characteristics as columns
Source
http://next.openspending.org/api/3/cubes/4b6d969e07ef7a86aa54e539fc127a14:wuppertalhaushalt/facts
Wuppertal Fiscal Data extracted from Open Spending API
Description
This dataset contains the budget of wuppertal for 2009 to 2020
The product ID
The account type
The kind
The year these amounts were measaured
The amount
The product area ID
The product group ID
The product
The product area
The product group
Format
A link with the json format data
Source
http://next.openspending.org/api/3/cubes/4b6d969e07ef7a86aa54e539fc127a14:wuppertalhaushalt/facts
Group and compare summaries statistics to a data frame
Description
Extract and return a data frame with the columns that include only numeric values
Usage
compare.stats(df, group_var, values, m_functions)
Arguments
df |
numeric vector or matrix or dataframe |
group_var |
character vector of variables to group the data |
values |
numeric or integer variables |
m_functions |
functions to apply in values |
Value
This function returns a data frame with the selected group_vars and the result of m_functions applied in the selected values.
Author(s)
Kleanthis Koupidis
Calculation of some Descriptive Tasks
Description
The function calculates the basic descriptive measures, the correlation and the boxplot parameters of all the numerical variables and the frequencies of all the nominal variables.
Usage
ds.analysis(data, c.out = 1.5, box.width = 0.15, outliers = TRUE, hist.class = "Sturges",
corr.method = "pearson", fr.select = NULL, tojson = FALSE)
Arguments
data |
The input data |
c.out |
Determines the length of the "whiskers" plot. If it is equal to zero no outliers will be returned. |
box.width |
The width level is determined 0.15 times the square root of the size of the input data. |
outliers |
If TRUE the outliers will be computed at the selected "c.out" level (default is 1.5 times the Interquartile Range). |
hist.class |
The method or the number of classes for the histogram. |
corr.method |
The correlation coefficient method to compute: "pearson" (default), "kendall" or "spearman". |
fr.select |
One or more nominal variables to calculate their corresponding frequencies. |
tojson |
If TRUE the results are returned in json format |
Details
This function returns a list with the basic statistics, the parameters needed to visualize a boxplot and a histogram, it also provides the frequencies of non numerical data of the input dataset and the correlation coefficient. The input of this function can be a matrix or data frame.
Value
A list or json file with the following components:
descriptives The descriptive measures
boxplot The statistics of the boxplot
histogram The histogram parameters
frequencies The frequencies and the relative frequencies of factors/characters of the input dataset
correlation The correlation coefficient
Author(s)
Kleanthis Koupidis, Charalampos Bratsas
See Also
Examples
# iris data frame as input with the default parameters
ds.analysis(iris)
# using iris data frame with different parameters
ds.analysis(iris, c.out = 1, box.width = 0.20, outliers = TRUE, tojson = TRUE)
# using iris data frame with different parameters
# fr.select parameter specified as Species
ds.analysis(iris, c.out = 1, outliers = FALSE, fr.select = "Species", tojson = TRUE)
# OpenBudgets.eu Dataset Example:
ds.analysis(Wuppertal_df, c.out = 2, box.width = 0.15,
outliers = FALSE, tojson = FALSE)
Boxplot Parameters of a numeric vector
Description
This function calculates the statistical measures needed to visualize the boxplot of a numeric vector.
Usage
ds.box(x, c = 1.5, c.width = 0.15 , out = TRUE, tojson = FALSE)
Arguments
x |
The input numeric vector |
c |
Determines the length of the "whiskers" plot. If it is equal to zero or out=F, no outliers will be returned. |
c.width |
The width level is determined 0.15 times the square root of the size of the input vector |
out |
If TRUE the outliers will be computed at the selected "c" level (default is 1.5 times the Interquartile Range). |
tojson |
If TRUE the results are returned in json format |
Details
This function returns a list with the parameters needed to visualize a boxplot.
Value
Returns a list or a json file with the following components:
lo.whisker The extreme of the lower whisker
lo.hinge The lower "hinge"
median The median
up.hinge The upper "hinge"
up.whisker The extreme of the upper whisker
box.width The width of the box (default is 0.15 times the square root of the size of the vector)
lo.out The values of any data points which lie below the extreme of the lower whisker
up.out The values of any data points which lie above the extreme of the upper whisker
n The non-NA observations of the vector
Author(s)
Kleanthis Koupidis, Charalampos Bratsas
See Also
Examples
# with vector as an input and the default parameters
vec <- as.vector(iris$Sepal.Width)
ds.box(vec)
# with vector as an input and the different parameters
vec <- as.vector(iris$Sepal.Width)
ds.box(vec, c = 3, c.width = 0.20 , out = FALSE, tojson = FALSE)
# OpenBudgets.eu Dataset Example:
amounts <- as.vector(Wuppertal_df$Amount)
ds.box(amounts, c = 1.5, c.width = 0.20, out = TRUE)
Boxplot Parameters of a matrix or data frame
Description
This function calculates the statistics of the boxplot for the input matrix or data frame.
Usage
ds.boxplot(data, out.level = 1.5, width = 0.15 , outl = TRUE, tojson = FALSE)
Arguments
data |
The input numeric matrix or data frame. |
out.level |
Determines the length of the "whiskers" plot. If it is equal to zero or "outl" is set to F, no outliers will be returned. |
width |
The width level is determined 0.15 times the square root of the size of the input data. |
outl |
If TRUE the outliers will be computed at the selected "out.level" level (default is 1.5 times the Interquartile Range). |
tojson |
If TRUE the results are returned in json format |
Details
This function returns as a list object the statistical parameters needed to visualize boxplot.
Value
Returns a list with the extracted components of ds.box
for each variable/column of the input data.
Author(s)
Aikaterini Chatzopoulou, Kleanthis Koupidis
See Also
ds.box
, ds.analysis
, open_spending.ds
Examples
# with matrix as an input and the default parameters
Matrix <- cbind(Uni05 = (1:200)/21, Norm = rnorm(200),
`5T` = rt(200, df = 5), Gam2 = rgamma(200, shape = 2))
ds.boxplot(Matrix, out.level = 1.5, width = 0.15 , outl = TRUE, tojson = FALSE)
# iris data frame as an input, different parameters and json output
ds.boxplot(iris, out.level = 2, width = 0.25 , outl = FALSE, tojson = TRUE)
# OpenBudgets.eu Dataset Example:
ds.boxplot(Wuppertal_df$Amount, out.level = 2.5, width = 0.15,
outl = TRUE, tojson = FALSE)
Correlation Coefficient of a dataframe
Description
This functions calculates the correlation coefficient of the input vectors, matrix or data frame. By default, the correlation coefficient of pearson is computed.
Usage
ds.correlation(x, y = NULL, cor.method = "pearson", tojson = FALSE)
Arguments
x |
A numeric vector, matrix or data frame |
y |
A vector, matrix or data frame with same dimension as x. By default it is equal with NULL. |
cor.method |
The correlation coefficient method to compute: "pearson" (default), "kendall" or "spearman". |
tojson |
If TRUE the results are returned in json format, default returns a data frame |
Details
This function returns an upper triangle matrix with the correlation coefficients of the input data. The correlation coefficient of pearson is computed, by default. Other options are "kendall" or "spearman".
Author(s)
Aikaterini Chatzopoulou, Kleanthis Koupidis, Charalampos Bratsas
See Also
Examples
# iris data frame as an input and the default parameters
ds.correlation(iris, cor.method = "pearson", tojson = FALSE)
# with matrix as an input , different parameters and json output
Matrix <- cbind(Uni05 = (1:200)/21, Norm = rnorm(200),
`5T` = rt(200, df = 5), Gam2 = rgamma(200, shape = 2))
ds.correlation(Matrix, cor.method = "kendall", tojson = TRUE)
Barplot parameters
Description
This function calculates the frequencies and the relative frequencies of factors/characters of the input dataset.
Usage
ds.frequency(data, select = NULL, tojson = FALSE)
Arguments
data |
A vector, matrix or data frame which includes at least one factor/character. |
select |
Select one or more specific nominal variables to calculate their corresponding frequencies, if it's not specified the result corresponds to frequencies of every factor variable in the data. |
tojson |
If TRUE the results are returned in json format, default returns a list |
Details
This function returns a list with the frequencies and relative frequencies of factors/characters of the input dataset.
Author(s)
Kleanthis Koupidis, Charalampos Bratsas
See Also
Examples
# iris data frame as an input and a selected column to calculate its frequencies
ds.frequency(iris, select = "Species", tojson = FALSE)
# iris data frame as an input without a selected column and json output
ds.frequency(iris, tojson = TRUE)
# OpenBudgets.eu Dataset Example:
ds.frequency(Wuppertal_df, select = "Produkt", tojson = FALSE)
Histogram breaks and frequencies
Description
This function computes the histogram parameters of the numeric input vector. The default for breaks is the value resulted from Sturges algorithm.
Usage
ds.hist(x, breaks = "Sturges", tojson = FALSE)
Arguments
x |
The input numeric vector, matrix or data frame |
breaks |
The method or the number of classes for the histogram |
tojson |
If TRUE the results are returned in json format, default returns a list |
Details
The possible values for breaks are Sturges see nclass.Sturges
,
Scott see nclass.scott
and FD or Freedman Diaconis nclass.FD
which are in package grDevices.
Value
A list or json file with the following components:
cuts The boundaries of the histogram classes
density The density of each histogram class
normal.curve.x Abscissa of the normal curve
normal.curve.y Ordinate of the normal curve
fit.line.x Abscissa of the data density curve
fit.line.y Ordinate of the data density curve
mean The average value of the input vector
median The median value of the input data
Author(s)
Kleanthis Koupidis, Charalampos Bratsas
See Also
Examples
# with a vector as an input and the defaults parameters
vec <- as.vector(iris$Sepal.Width)
ds.hist(vec)
# OpenBudgets.eu Dataset Example:
ds.hist(Wuppertal_df$Amount, tojson = TRUE)
Calculation of Kurtosis
Description
This function calculates kurtosis of the input vector, matrix or data frame.
Usage
ds.kurtosis(x, tojson = FALSE)
Arguments
x |
A numeric vector, matrix or data frame. |
tojson |
If TRUE the results are returned in json format |
Details
This function returns the kurtosis, based on a scaled version of the fourth moment, of numbers of the input data.
Author(s)
Aikaterini Chatzopoulou, Charalampos Bratsas
See Also
ds.skewness
, ds.statistics
,
ds.analysis
, open_spending.ds
Examples
# with a matrix as an input
Matrix <- cbind(Uni05 = (1:200)/21, Norm = rnorm(200),
`5T` = rt(200, df = 5), Gam2 = rgamma(200, shape = 2))
ds.kurtosis(Matrix, tojson = FALSE)
# with iris data frame as an input
ds.kurtosis(iris, tojson = FALSE)
# with a vector as an input and json output
vec <- as.vector(iris$Sepal.Width)
ds.kurtosis(vec, tojson = TRUE)
# OpenBudgets.eu Dataset Example:
ds.kurtosis(Wuppertal_df, tojson = FALSE)
Calculation of Skewness
Description
This function calculates skewness of the input vector, matrix or data frame.
Usage
ds.skewness(x, tojson = FALSE)
Arguments
x |
A numeric vector, matrix or data frame. |
tojson |
If TRUE the results are returned in json format |
Details
This function returns the skewness, also known as Pearson's moment coefficient of skewness, of numbers of the input data.
Author(s)
Aikaterini Chatzopoulou
See Also
ds.kurtosis
, ds.statistics
,
ds.analysis
, open_spending.ds
Examples
# with a matrix as an input
Matrix <- cbind(Uni05 = (1:200)/21, Norm = rnorm(200),
`5T` = rt(200, df = 5), Gam2 = rgamma(200, shape = 2))
ds.skewness(Matrix, tojson = FALSE)
# with iris data frame as an input
ds.skewness(iris, tojson = FALSE)
# with a vector as an input and json output
vec <- as.vector(iris$Sepal.Width)
ds.skewness(vec, tojson = TRUE)
# OpenBudgets.eu Dataset Example:
ds.skewness(Wuppertal_df, tojson = FALSE)
Calculation of the Statistic Measures
Description
This function calculates the basic descriptive measures of the input dataset.
Usage
ds.statistics(data, tojson = FALSE)
Arguments
data |
A numeric vector, matrix or data frame |
tojson |
If TRUE the results are returned in json format, default returns a list |
Details
This function returns the following values of the input data: minimum, maximum, range, mean, median, first and third quantiles, variance, standart deviation, skewness and kurtosis.
Value
A list or json file with the following components:
Min The minimum observed value of the input data
Max The maximum observed value of the input data
Range The range, defined as the difference of the maximum and the minimum value.
Mean The average value of the input data
Median The median value of the input data
Quantiles The 25% and 75% percentiles
Variance The variance of the input data
Standard Deviation The standard deviation of the input data
Skewness The Skewness of the input data
Kurtosis The Kurtosis of the input data
Author(s)
Aikaterini Chatzopoulou, Kleanthis Koupidis, Charalampos Bratsas
See Also
Examples
# with matrix as an input and json outpout
Matrix <- cbind(Uni05 = (1:200)/21, Norm = rnorm(200),
`5T` = rt(200, df = 5), Gam2 = rgamma(200, shape = 2))
ds.statistics(Matrix, tojson = TRUE)
# with vector as an input
vec <- as.vector(iris$Sepal.Width)
ds.statistics(vec, tojson = FALSE)
# with iris data frame as an input
ds.statistics(iris, tojson = FALSE)
# OpenBudgets.eu Dataset Example:
ds.statistics(Wuppertal_df$Amount, tojson = TRUE)
Multiple replacement
Description
Extract and return a data frame with the columns that include only numeric values
Usage
multisub(pattern, replacement, x, ...)
Arguments
pattern |
Chararcter string vector containing a regular expression to be matched in the given character vector |
replacement |
A character vector of equal length with the pattern to be replaced. |
x |
A character vector or an object where the matches are |
... |
other parameters to pass |
Value
This function returns a character vector with the replacements.
Author(s)
Kleanthis Koupidis
Select the numeric columns of a given dataset
Description
Extract and return a data frame with the columns that include only numeric values
Usage
nums(data)
Arguments
data |
A numeric vector, matrix or data frame. |
Value
This function returns a data frame with the numeric columns of the input dataset.
Author(s)
Kleanthis Koupidis
Examples
# with data frame as input
nums(iris)
# with vector as input
vec <- as.vector(iris$Sepal.Width)
nums(vec)
# with matrix as input
Matrix <- cbind(Uni05 = (1:200)/21, Norm = rnorm(200),
`5T` = rt(200, df = 5), Gam2 = rgamma(200, shape = 2))
nums(Matrix)
# OpenBudgets.eu Dataset Example:
head(nums(Wuppertal_df))
Read and Calculate the Basic Information for Basic Descriptive Tasks from Open Spending and Rudolf APIs.
Description
Extract and analyze the input data provided from Open Spending API of OpenBudgets.eu, using the ds.analysis function.
Usage
open_spending.ds(json_data, dimensions = NULL, amounts = NULL,
measured.dimensions = NULL, coef.outl = 1.5, box.outliers = TRUE,
box.wdth = 0.15, cor.method = "pearson", freq.select = NULL)
Arguments
json_data |
The json string, URL or file from Open Spending API |
dimensions |
The dimensions of the input data |
amounts |
The measures of the input data |
measured.dimensions |
The dimensions to which correspond amount/numeric variables |
coef.outl |
Determines the length of the "whiskers" plot. If it is equal to zero no outliers will be returned. |
box.outliers |
If TRUE the outliers will be computed at the selected "coef.outl" level (default is 1.5 times the Interquartile Range). |
box.wdth |
The width level is determined 0.15 times the square root of the size of the input data. |
cor.method |
The correlation coefficient method to compute: "pearson" (default), "kendall" or "spearman". |
freq.select |
One or more nominal variables to calculate their corresponding frequencies. |
Details
This function is used to read data in json format from Open Spending and Rudolf APIs., in order to implement
some basic descriptive tasks through ds.analysis
function.
Value
A json string with the resulted parameters of the ds.analysis
function.
Author(s)
Kleanthis Koupidis
See Also
Examples
# OpenBudgets.eu Dataset Example:
# open_spending.ds(json_data = Wuppertal_openspending,
# dimensions ="functional_classification_3.Produktgruppe|date_2.Year",
# amounts = "Amount")
Sample data from Open Spending
Description
Sample data of Revised Budget phase amounts
The year (2016) of the recorded approved budget phase amounts
The revised budget phase amounts of 2016
The original amounts of this year
The functional classification description
The functional classification code
Format
A link with the json format data
Source
http://next.openspending.org/