Title: | DDI with R |
Version: | 0.19 |
URL: | https://github.com/dusadrian/DDIwR |
BugReports: | https://github.com/dusadrian/DDIwR/issues |
Description: | Useful functions for various DDI (Data Documentation Initiative) related inputs and outputs. Converts data files to and from DDI, SPSS, Stata, SAS, R and Excel, including user declared missing values. |
License: | GPL (≥ 3) |
Depends: | R (≥ 3.5.0) |
Imports: | admisc (> 0.36), base64enc, declared (> 0.24), digest, tools, xml2, haven, readxl, writexl |
Language: | en-US |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3.9000 |
NeedsCompilation: | no |
Packaged: | 2024-12-10 00:00:05 UTC; dusadrian |
Author: | Adrian Dusa |
Maintainer: | Adrian Dusa <dusa.adrian@unibuc.ro> |
Repository: | CRAN |
Date/Publication: | 2024-12-10 07:20:07 UTC |
Useful functions for various DDI (Data Documentation Initiative) related outputs.
Description
This package provides various functions to read DDI based metadata documentation and write dedicated setup files for R, SPSS, Stata and SAS to read an associated .csv file containing the raw data, apply labels for variables and values and also deal with the treatment of missing values.
It can also generate a DDI metadata file out of an R information object,
which can be used to export directly to the standard statistical packages
files (such as SPSS, Stata and SAS, or even Excel), using the versatile
package haven. For R, the default object to store data and
metadata is a data.frame
, and labelled data are automatically
coerced to class declared
.
The research leading to the initial functions in this package has received funding from the European Union's Seventh Framework Program (FP7/2007-2013) under grant agreement no. 262608 (DwB - Data without Boundaries)
Details
Package: | DDIwR |
Type: | Package |
Version: | 0.19 |
Date: | 2024-12-09 |
License: | GPL-v3 |
Author(s)
Adrian Dusa
Department of Sociology
University of Bucharest
dusa.adrian@unibuc.ro
Add/remove/change one or more children or attributes from a DDI Codebook attribute.
Description
addChildren()
adds one or more children to a standard DDI Codebook element
(see makeElement
), anyChildren()
checks if an element has any
children at all, hasChildren()
checks if the element has specific children,
indexChildren()
returns the positions of the children among all containing
children, and getChildren()
extracts them. For attributes and content,
there are dedicated functions to add*()
, remove*()
and change*()
.
Usage
addChildren(children, to, overwrite = TRUE, ...)
anyChildren(element)
getChildren(xpath, from, ...)
hasChildren(element, name)
indexChildren(element, name)
removeChildren(name, from, overwrite = TRUE, ...)
addContent(content, to, overwrite = TRUE)
changeContent(content, to, overwrite = TRUE)
removeContent(from, overwrite = TRUE)
addAttributes(attrs, to, overwrite = TRUE)
anyAttributes(element)
changeAttributes(attrs, from, overwrite = TRUE)
hasAttributes(element, name)
removeAttributes(name, from, overwrite = TRUE)
Arguments
children |
A standard element of class |
to |
A standard element of class |
overwrite |
Logical, overwrite the original object in the parent frame. |
... |
Other arguments, mainly for internal use. |
element |
A standard element of class |
xpath |
Character, a path to a DDI Codebook element. |
from |
A standard element of class |
name |
Character, name(s) of specific child element / attribute. |
content |
Character, the text content of a DDI element. |
attrs |
A list of specific attribute names and values. |
Details
Although an XML list generally allows for multiple contents, sometimes spread between the children elements, it is preferable to maintain a single content (eventually separated with carriage return characters for separate lines).
Arguments are unique, and can be changed by simply referring to their names.
Elements, however, can be repeated. For instance element var
to describe
variables, within the dataDscr
(data description) sub-element in the
codeBook
. There are as many such var
elements as the number of variables
in the dataset, in which case it is not possible to change a specific var
element by referring to its name. For this purpose, it is useful to extract
the positions of all var
elements to iterate through, which is the purpose
of the function indexChildren()
.
Future versions will allow deep manipulations of child elements using the
xpath
argument.
If more than one children, they should be grouped into a list.
Value
An invisible standard DDI element. Functions any*()
and has*()
return a logical (vector).
Author(s)
Adrian Dusa
DDIwR internal functions
Description
Functions to be used internally, only by developers and contributors.
ascii
: Convert accented unicode characters to pure ascii
changeXMLang
: Remove the xmlang
attribute from all elements.
checkArgument
: Check function arguments
checkDots
: Check the three dots ... argument.
If the argument is not supplied via dots, returns a default
checkElement
: Check if an element is a DDI Codebook element
checkExisting
: Check if a certain (sub)element exists in a DDI Codebook.
checkType
: Determine the variable type: categorical, numerical or mixed
checkXMList
: Determine if an XML list is a DDI Codebook
cleanup
: Rectify texts read from a metadata object
coerceDDI
: Recursive coerce an element to a DDI class
collectMetadata
: Collect metadata from a file or a dataframe object
collectRMetadata
: Collect metadata from a dataframe object
extractData
: Extract data from an DDI Codebook XML document or
list.
formatExample
: Format an example from the DDI Codebook
specification
generateID
: Generate simple, custom unique ID codes
getDateTime
: Current date and time
getDelimiter
: Guess the column delimiter from a text file
getDNS
: Extracts the Default Name Space from an XML object
getEnter
: Get the carriage return code from the current Operating System
getFiles
: Get information about the files in a given directory
getFormat
: Determine the SPSS / Stata variable format
getHashes
: Compute hashes of XML nodes
getValues
: Extract values, labels and missing values from a var
element
getXML
: Read the DDI XML file, testing if it can be loaded.
hasLabels
: Check if a dataset has (declared) labels
hasMissingLabels
: Check if variables have missing labels
makeLabelled
: Coerce variables to labelled objects
missingValuesSPSS
: Prepares the missing values for the SPSS export syntax.
prespace
: Prepend a text with a certain number of space characters.
removeExtra
: Removes extra information from a DDI Codebook element
repeatSpace
: Allows indentation of XML or HTML files
replaceChars
: Replace certain characters, in preparation for XML export
replaceTicks
: Recode all tick characters with a single quote.
splitrows
: Split the written rows in the setup file.
treatPath
: Determine which specific type of files are
present in a certain directory.
writeMetadata
: Utility function to write the metadata
part in the setup file.
XMLtoRmetadata
: Extract metadata from a single XML variable
writeRlist
: Write an .R file containing a metadata specific list.
Usage
ascii(x)
changeXMLang(x, remove = FALSE)
checkArgument(argument, default, length = 1, ...)
checkDots(dotsvalue, default, length = 1)
checkElement(x)
checkExisting(xpath, inside, attribute = NULL)
checkType(x, labels = NULL, na_values = NULL, na_range = NULL)
checkXMList(xmlist)
cleanup(x, cdata = TRUE)
coerceDDI(element, name = NULL)
collectMetadata(from, ...)
collectRMetadata(from, ...)
extractData(xml)
formatExample(xml_node, level = 0, indent = 2, output = NULL, ...)
generateID(x, nchars = 16)
getDateTime()
getDelimiter(x)
getDNS(xml)
getEnter(OS)
getFiles(path = ".", type = "*", currdir)
getFormat(x, type = c("SPSS", "Stata"), ...)
getHashes(nodes)
getValues(variable)
getXML(path, encoding = "UTF-8")
hasLabels(x)
hasMissingLabels(variables)
makeLabelled(x, variables, declared = TRUE)
missingValuesSPSS(variables, range = FALSE, numvars = TRUE)
prespace(text, indent = NULL)
removeExtra(element)
repeatSpace(times, indent)
replaceChars(x)
replaceTicks(x)
splitrows(x, enter, y = 80, spacerep = "")
treatPath(path, type = "*", single = FALSE, check = TRUE)
writeMetadata(variables, OS = "", indent = 4)
XMLtoRmetadata(xmlvar, dns)
makeNotes(data)
getMetadata(...)
exportDDI(...)
writeRlist(variables, OS = "windows", indent = 4, dirpath = "", filename = "")
Arguments
x |
Number of ID values to return |
nchars |
Number of characters for each ID |
Value
changeXMLang
: A modified codeBook
element.
checkElement
: Boolean.
checkExisting
: Boolean.
checkType
: A character scalar
cleanup
: A character vector
coerceDDI
: A standard element of class DDI
collectMetadata
: a standard DDI Codebook element dataDscr
,
containing variable level metadata information
collectRMetadata
: an R list containing variable level metadata information
extractData
: An R data frame, if existing, or NULL
generateID
: Character vector
getDateTime
: Character vector
getDelimiter
: Character scalar
getDNS
: Character scalar
getEnter
: Character scalar
getFiles
: A list with four components: the complete path, the files, the file names and the file extensions
getFormat
: Character scalar
getHashes
: Character vector
getValues
: A list with two components: labels
and na_values
getXML
: An XML document
hasLabels
: Boolean
hasMissingLabels
: Boolean vector
makeLabelled
: A modified data frame.
missingValuesSPSS
: A vector of missing values representation.
prespace
: A modified text.
removeExtra
: A modified element.
repeatSpace
: Character spaces
replaceChars
: Character vector
replaceTicks
: A recoded string.
splitrows
: A character vector.
treatPath
: A list with four components: the complete path, the
files, the file names and the file extensions
XMLtoRmetadata
: An R list containing metadata
Converts a dataset from one statistical software to another
Description
This function converts (or transfers) between R, Stata, SPSS, SAS, Excel and DDI XML files. Unlike the regular import / export functions from packages haven or rio, this function uses the DDI standard as an exchange platform and facilitates a consistent conversion of the missing values.
Usage
convert(
from,
to = NULL,
declared = TRUE,
chartonum = FALSE,
recode = TRUE,
encoding = "UTF-8",
csv = NULL,
...
)
Arguments
from |
A path to a file, or a data.frame object |
to |
Character, the name of a software package or a path to a specific file |
declared |
Logical, return the resulting dataset as a declared object |
chartonum |
Logical, recode character categorical variables to numerical categorical variables |
recode |
Logical, recode missing values |
encoding |
The character encoding used to read a file |
csv |
Complex argument, see the Details section |
... |
Additional parameters passed to other functions, see the Details section |
Details
When the argument to
specifies a certain statistical package
("R"
, "Stata"
, "SPSS"
, "SAS"
, "XPT"
) or "Excel"
, the name of the
destination file will be identical to the one in the argument from
,
with an automatically added software specific extension.
SPSS portable file (with the extension ".por"
) can only be read, but not
written.
The argument to
can also be specified as a path to a specific file,
in which case the software package is determined from its file extension.
The following extentions are currently recognized: .xml
for DDI,
.rds
for R, .dta
for Stata, .sav
for SPSS, .xpt
for SAS, and
.xlsx
for Excel.
Additional parameters can be specified via the three dots argument
...
, that are passed to the respective functions from packages
haven and readxl. For instance the function
write_dta()
has an additional argument called
version
when writing a Stata file.
The most important argument to consider is called user_na
, part of
the function read_sav()
. Defaulted to FALSE
in
package haven, in package DDIwR it is used as
having the value of TRUE
, and it can be deactivated by explicitly
specifying user_na = FALSE
in function convert()
.
The same three dots argument is used to pass additional parameters to other
functions in this package, for instance exportCodebook()
when writing
to a DDI file. One of its argument embed
(activated by default) can be
used to control embedding the data in the XML file. Deactivating it will
create a CSV file in the same directory, using the same file name as the
XML file.
When converting from DDI, if the dataset is not embedded in the XML file, the
CSV file is expected to be found in the same directory as the DDI Codebook,
and it should have the same file name as the XML file. The path to the CSV
file can be provided via the csv
argument. Additional formal
parameters of the function read.csv()
can
be passed via the same three dots ...
argument. Alternatively, the
csv
argument can also be an R data frame.
When converting to DDI, if the argument embed
is set to FALSE
, users
have the option to save the data in a separate CSV file (the default) or not
to save the data at all, by setting csv
to FALSE
.
The DDI .xml file generates unique IDs for all variables, if not already present in the attributes. These IDs are useful for newer versions of the DDI Codebook, for referencing purposes.
The argument chartonum
signals recoding character categorical
variables, and employs the function recodeCharcat()
.
This only makes sense when recoding to Stata, which does not allow allocating
labels for anything but integer variables.
If the argument to
is left to NULL
, the data is (invisibly) returned
to the R enviroment. Conversion to R, either in the working space or as
a data file, will result (by default) in a data frame containing declared
labelled variables, as defined in package declared.
The current version reads and creates DDI Codebook version 2.6, with future
versions to extend the functionality for DDI Lifecycle versions 3.x and link
to the future package DDI4R for the UML model based version 4. It
extends the standard DDI Codebook by offering the possibility to embed a
serialized version of the R dataset into the XML file containing the
Codebook, within a notes
child of the fileDscr
component. This type of
generated codebook is unique to this package and automatically detected when
converting to another statistical software. This will likely be replaced with
a time insensitive text version.
Converting to SAS is experimental, and it relies on the same package
haven that uses the ReadStat C library. The safest way to
convert, which at the same time consistently converts the missing values, is
to export the data to a CSV file and create a setup file produced by function
setupfile()
and run the commands manually.
Converting data from SAS is possible, however reading the metadata is also
experimental (the current version of haven only partially imports the
metadata). Either specify the path to the catalog file using the argument
catalog_file
from the function read_sas()
,
or have the catalog file in the same directory as the data set, with the same
file name and the extension .sas7bcat
The argument recode
controls how missing values are treated. If the
input file has SPSS like numeric codes, they will be recoded to extended
(a-z) missing types when converting to Stata or SAS. If the input has Stata
like extended codes, they will be recoded to SPSS like numeric codes.
The character encoding
is usually passed to the corresponding functions
from package haven. It can be set to NULL
to reset at the
default in that package.
Converting to SPSS works with numerical and character labelled vectors, with or without labels. Date/Time variables are partially supported by package haven: either having such a variable with no labels and missing values, or if labels and missing values are declared the variable is automatically coerced to numeric, and users may have to make the proper settings in SPSS.
Value
An invisible R data frame, when the argument to
is NULL.
Author(s)
Adrian Dusa
References
DDI - Data Documentation Initiative, see the DDI Alliance website.
See Also
setupfile
,
getCodebook
,
declared
Examples
## Not run:
# Assuming an SPSS file called test.sav is located in the working directory
# The following command imports the file into the R environment:
test <- convert("test.sav")
# The following command will extract the metadata in a DDI Codebook and
# produce a test.xml file in the same directory
convert("test.sav", to = "DDI")
# The data may be saved separately from the DDI file, using:
convert("test.sav", to = "DDI", embed = FALSE)
# To produce a Stata file:
convert("test.sav", to = "Stata")
# To produce an R file:
convert("test.sav", to = "R")
# To produce an Excel file:
convert("test.sav", to = "Excel")
## End(Not run)
Export a DDI Codebook to an XML file.
Description
Create a DDI Codebook version 2.6, XML file structure.
Usage
exportCodebook(codeBook, to = "", OS = "", indent = 2, ...)
Arguments
codeBook |
A standard element of class |
to |
either a character string naming a file or a connection open for writing ("" indicates output to the console) |
OS |
The target operating system, for the eol - end of line character(s) |
indent |
Indent width, in number of spaces |
... |
Other arguments, mainly for internal use |
Details
#' The information object is a codeBook
DDI element having at least two
main children:
-
fileDscr
, with the data provided as a sub-component nameddatafile
-
dataDscr
, having as many components as the number of variables in the (meta)data.
For the moment, only DDI codebook version 2.6 is exported, and DDI Lifecycle is planned for future releases.
A small number of required DDI specific elements and attributes have generic
default values, if not otherwise specified in the codeBook
list object. For
the current version, these are: monolang
, xmlang
, IDNo
, titl
,
agency
, URI
(for the holdings
element), distrbtr
, abstract
and
level
(for the otherMat
element).
The codeBook
object is exported as provided, and it is the user's
responsibility to test its validity against the XML schema. Most of these
arguments help create the mandatory element stdyDscr
, which cannot be
harvested from the dataset. If this element is not already present, providing
any of these arguments via the three dots ...
gate, signal an automatic
creation and inclusion with the values provided.
Argument xmlang
expects a two letter ISO country coding, for instance
"en"
to indicate English, or "ro"
to indicate Romanian etc. The original
DDI Codebook attribute is called xml:lang
, which for obvious reasons
had to be renamed into this R function.
A logical argument monolang
signal if the document is monolingual, in which
case the attribute xmlang
is placed a single time for the entire document
in the codeBook
element. For multilingual documents, xmlang
should be
placed in the attributes of various other (child) elements, for instance
abstract
, or the study title, name of the distributing institution,
variable labels etc.
The argument OS
can be either:
"windows"
(default), or "Windows"
, "Win"
, "win"
,
"MacOS"
, "Darwin"
, "Apple"
, "Mac"
, "mac"
,
"Linux"
, "linux"
.
The end of line separator changes only when the target OS is different from the running OS.
The argument indent
controls how many spaces will be used in the XML
file, to indent the different sub-elements.
Value
An XML file containing a DDI version 2.6 metadata.
Author(s)
Adrian Dusa
See Also
https://ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/field_level_documentation.html
Examples
## Not run:
exportCodebook(codeBook, to = "codebook.xml")
# using a namespace
exportCodebook(codeBook, to = "codebook.xml", xmlns = "ddi")
## End(Not run)
Extract metadata information
Description
Extract a list containing the variable labels, value labels and any available information about missing values.
Usage
getCodebook(from = NULL, encoding = "UTF-8", ignore = NULL, ...)
Arguments
from |
A path to a file, or a data frame object |
encoding |
The character encoding used to read a file |
ignore |
Character, ignore DDI elements when reading from an XML file |
... |
Additional arguments for this function (internal use only) |
Details
This function extracts the metadata from an R dataset, or alternatively it can read an XML file containing a DDI codebook version 2.6, or an SPSS or Stata file and returns a list containing the variable labels, value labels and information about the missing values.
If the input is a dataset, it will extract the variable level metadata (labels, missing values etc.). From a DDI XML file, it will import all metadata elements, the most expensive being the data description.
For the moment, only DDI Codebook is supported, but DDI Lifecycle is planned to be implemented.
Value
An R list roughly equivalent to a DDI Codebook, containing all variables, their corresponding variable labels and value labels, and (if applicable) missing values if imported and found.
Author(s)
Adrian Dusa
Examples
x <- data.frame(
A = declared(
c(1:5, -92),
labels = c(Good = 1, Bad = 5, NR = -92),
na_values = -92
),
C = declared(
c(1, -91, 3:5, -92),
labels = c(DK = -91, NR = -92),
na_values = c(-91, -92)
)
)
getCodebook(from = x)
Create the catgry
elements for a particular variable
Description
Utility function to create the catgry
elements, as well as all
necessary sub-elements (e.g. catValu
, labl
, varFormat
) along with their
associated XML attributes.
Usage
makeCategories(metadata)
Arguments
metadata |
A list of two or three components: |
Value
A list of standard catgry
DDI elements.
Author(s)
Adrian Dusa
Create a notes
element for the dataset.
Description
Create the notes
element to embed a serialized, gzip-ed version of the data
in the fileDscr
section of the codeBook
.
Usage
makeDataNotes(data)
Arguments
data |
An R dataframe. |
Value
A standard notes
DDI element.
Author(s)
Adrian Dusa
Make a DDI Codebook element
Description
Creates a standard DDI element.
Usage
makeElement(
name,
children = NULL,
attributes = NULL,
content = NULL,
fill = FALSE,
...
)
Arguments
name |
Character, a DDI Codebook element name. |
children |
A list of standard DDI codebook elements. |
attributes |
A vector of named values. |
content |
Character scalar. |
fill |
Logical, fill the element with arbitrary values for its mandatory children and attributes |
... |
Other arguments, see Details. |
Details
The structure of a DDI element in R follows the usual structure of
an XML node, as returned by the function as_list()
from package xml2,
with one additional (first) component named ".extra" to accommodate any other
information that is not part of the DDI element.
In the DDI Codebook, most elements and their attributes are optional, but some are mandatory. In case of attributes, some become mandatory only if the element itself is present. The mandatory elements need to be present in the final version of the Codebook, to pass the validation against the XML schema.
By activating the argument fill
, this function creates DDI elements
containing all mandatory (sub)elements and (their) attributes, filled with
arbitrary values that can be changed later on. Some recommended elements are
also filled, as expected by the CESSDA Data Catalogue profile for DDI
Codebook.
By default, the Codebook is assumed to have a single language for all
elements. The argument monolang
can be deactivated through the "...
"
gate, in which situation the appropriate elements will receive a default
argument xmlang = "en"
. For other languages, that argument can also be
provided through the "...
" gate.
One such DDI Codebook element is the stdyDscr
(Study Description), with the
associated mandatory children, for instance title, ID number, distributor,
citation, abstract etc.
The complete list of elements for which default values are added is: "IDNo", "titl", "titlStmt", "distrbtr", "distStmt", "holdings", "citation", "abstract", "stdyInfo", "stdyDscr", "prodDate", "software", "prodStmt", "docDscr" and "otherMat".
Value
A standard list element of class "DDI"
with reserved component names.
Author(s)
Adrian Dusa
See Also
addChildren
getChildren
showDetails
Examples
stdyDscr <- makeElement("stdyDscr", fill = TRUE)
# easier to extract with:
getChildren("citation/titlStmt/titl", from = stdyDscr)
Recode character categorical variables
Description
Recodes a character categorical variables to a numerical categorical variable.
Usage
recodeCharcat(x, ...)
Arguments
x |
A character categorical variable |
... |
Other internal arguments |
Details
For this function, a categorical variable is something else than a base
factor. It should be an object of class "declared"
, or an object of class
"haven_labelled_spss"
, with a specific attribute called "labels"
that
stores the value labels.
Value
A numeric categorical variable of the same class as the input.
Author(s)
Adrian Dusa
Examples
x <- declared(
c(letters[1:5], -91),
labels = c(Good = "a", Bad = "e", NR = -91),
na_values = -91
)
recodeCharcat(x)
Consistent recoding of (extended) missing values
Description
A function to recode all missing values to either SPSS or Stata types, uniformly (re)using the same codes across all variables.
Usage
recodeMissings(
dataset,
to = c("SPSS", "Stata", "SAS"),
dictionary = NULL,
start = -91,
...
)
Arguments
dataset |
A data frame |
to |
Software to recode missing values for |
dictionary |
A named vector, with corresponding Stata missing codes to SPSS missing values |
start |
A named vector, with corresponding Stata missing codes to SPSS missing values |
... |
Other internal arguments |
Details
When a dictionary is not provided, it is automatically constructed from the available data and metadata, using negative numbers starting from -91 and up to 27 letters starting with "a".
If the dataset contains mixed variables with SPSS and Stata style missing values, unless otherwise specified in a dictionary it uses other codes than the existing ones.
For the SPSS type of missing values, the resulting variables are coerced to a declared labelled format.
Unlike SPSS, Stata does not allow labels for character values. Both cannot be
transported from SPSS to Stata, it is either one or another. If labels are
more important to preserve than original values (especially the information
about the missing values), the argument chartonum
replaces all character
values with suitable, non-overlapping numbers and adjusts the labels
accordingly.
If no labels are found in the metadata, the original values are preserved.
Value
A data frame with all missing values recoded consistently.
Author(s)
Adrian Dusa
Examples
x <- data.frame(
A = declared(
c(1:5, -92),
labels = c(Good = 1, Bad = 5, NR = -92),
na_values = -92
),
B = labelled(
c(1:5, haven::tagged_na('a')),
labels = c(DK = haven::tagged_na('a'))
),
C = declared(
c(1, -91, 3:5, -92),
labels = c(DK = -91, NR = -92),
na_values = c(-91, -92)
)
)
xrec <- recodeMissings(x, to = "Stata")
attr(xrec, "dictionary")
dictionary <- data.frame(
old = c(-91, -92, "a"),
new = c("c", "d", "c")
)
recodeMissings(x, to = "Stata", dictionary = dictionary)
recodeMissings(x, to = "SPSS")
dictionary$new <- c(-97, -98, -97)
recodeMissings(x, to = "SPSS", dictionary = dictionary)
recodeMissings(x, to = "SPSS", start = 991)
recodeMissings(x, to = "SPSS", start = -8)
Search for key words
Description
Search function to return elements that contain a certain word or regular expression pattern.
Usage
searchFor(
x,
where = c("everywhere", "title", "description", "attributes", "examples"),
...
)
Arguments
x |
Character, either word(s) or a regular expression. |
where |
Character, in which section(s) to search for. |
... |
Other arguments to be passed to the grepl() function. |
Value
Character vector of DDI element names.
Author(s)
Adrian Dusa
Create setup files for SPSS, Stata, SAS and R
Description
Creates a setup file, based on a list of variable and value labels.
Usage
setupfile(
obj,
file = "",
type = "all",
csv = NULL,
recode = TRUE,
OS = "",
stringnum = TRUE,
...
)
Arguments
obj |
A data frame, or a list object containing the metadata, or a path to a data file or to a directory where such objects are located, for batch processing |
file |
Character, the (path to the) setup file to be created |
type |
The type of setup file, can be: "SPSS", "Stata", "SAS", "R", or "all" (default) |
csv |
The original dataset, used to create the setup file commands, or a path to the directory where the .csv files are located, for batch processing |
recode |
Logical, recode missing values to extended .a-.z range |
OS |
The target operating system, for the eol - end of line character(s) |
stringnum |
Logical, recode string variables to numeric |
... |
Other arguments, see Details below |
Details
When a path to a metadata directory is specified for the argument obj
,
then next argument file
is silently ignored and all created setup files
are saved in a directory called "Setup Files" that (if not already found) is
created in the working directory.
The argument file
expects the name of the final setup file being
saved on the disk. If not specified, the name of the object provided for the
obj
argument will be used as a filename.
If file
is specified, the argument type
is automatically
determined from the file's extension, otherwise when type = "all"
, the
function produces one setup file for each supported type.
If batch processing multiple files, the function will inspect all files in
the provided directory, and retain only those with the extension .R
or
.r
or DDI versions with the extension .xml
or .XML
(it will
subsequently generate an error if the .R files do not contain an object list,
or if the .xml
files do not contain a DDI structured metadata file).
If the metadata directory contains a subdirectory called "data"
or
"Data"
, it will match the name of the metadata file with the name of the
.csv
file (their names have to be exactly the same, regardless of
their extension).
The csv
argument can provide a data frame object produced by reading
the .csv
file, or a path to the directory where the .csv
files are
located. If the user doesn't provide something for this argument, the
function will check the existence of a subdirectory called data
in the
directory where the metadata files are located.
In batch mode, the code starts with the argument delim = ","
, but if
the .csv
file is delimited differently it will also try hard to find other
delimiters that will match the variable names in the metadata file. At the
initial version 0.1-0, the automatically detected delimiters include ";"
and "\t"
.
The argument OS
(case insensitive) can be either:
"Windows"
(default), or "Win"
,
"MacOS"
, "Darwin"
, "Apple"
, "Mac"
,
"Linux"
.
The end of line character(s) changes only when the target OS is different from the running OS.
Value
A setup file to complement the imported raw dataset.
Author(s)
Adrian Dusa
Examples
## Not run:
# IMPORTANT:
# make sure to set the working directory to a directory with
# read/write permissions
# setwd("/path/to/read/write/directory")
setupfile(codeBook)
# if the csv data file is available
setupfile(codeBook, csv="/path/to/csv/file.csv")
# generating a specific type of setup file
setupfile(codeBook, file = "codeBook.do") # type = "Stata" also works
# other types of possible utilizations, using paths to specific files
# an XML file containing a DDI metadata object
setupfile("/path/to/the/metadata/file.xml", csv="/path/to/csv/file.csv")
# or in batch mode, specifying entire directories
setupfile("/path/to/the/metadata/directory", csv="/path/to/csv/directory")
## End(Not run)
Describe what a DDI element is
Description
Describe what a DDI element is
Usage
showDetails(x, ...)
showDescription(x, ...)
showAttributes(x, name = NULL, ...)
globalAttributes()
showExamples(x, ...)
showRelations(x, ...)
showLineages(x, ...)
Arguments
x |
Character, a DDI Codebook element name. |
... |
Other arguments, mainly for internal use. |
name |
Character, print only a specific element (name) |
Details
All arguments having predefined values such as "(Y | N) : N" are mandatory if the element is used
Author(s)
Adrian Dusa
Examples
showDetails("codeBook")
showAttributes("catgry")
showExamples("abstract")
showLineages("titl")
Validate a DDI element.
Description
Attempts a minimal validation of a DDI Codebook element, by searching for mandatory elements and attributes.
Usage
testValid(element, monolang = TRUE)
Arguments
element |
A standard element of class |
monolang |
Logical, the codebook file is monolingual |
Details
This function currently attempts a minimal check for the absolute
most mandatory elements, such as the stdyDscr
. An absolute bare version
of this element, filled with arbitrary default values, can be produced with
the function makeElement()
, activating its attribute fill
.
It also checks for chained expectations, that is element X is mandatory only
if the parent element is present.
Future versions will implement more functionality for recommended elements and attributes, with the intention to provide a 1:1 validation as offered by the "CESSDA Metadata Validator".
To ease the validation of the DDI Codebook XML files, the argument monolang
is activated by default. This means a single attribute xmlang
in the main
codeBook
element. For multi-language codebooks, an error is flagged if this
argument is missing where appropriate.
Value
A character vector of validation problems found.
Author(s)
Adrian Dusa
See Also
Update Codebook.
Description
Update an XML file containing a DDI Codebook.
Usage
updateCodebook(xmlfile, with, ...)
Arguments
xmlfile |
A path to a DDI Codebook XML document. |
with |
An R object containing a root |
... |
Other internal arguments. |
Details
This function replaces entire Codebook sections. Any such section present in the R object will replace the corresponding section from the XML document.
Author(s)
Adrian Dusa