There are several functions in base R to pivot data. These include
the t() and reshape() functions. There are
also well-designed functions in the tidyverse for
transposing data. Nevertheless, some users will prefer the syntax of
proc_transpose(). This function provides control over
output column naming, and an intuitive set of parameters.
To explore proc_transpose, let’s first create some
sample data:
# Create input data
dat <- read.table(header = TRUE, text = '
                  Name    Subject   Semester1  Semester2
                  Samma   Maths     96             94
                  Sandy   English     76             51
                  Devesh    German    76             95
                  Rakesh    Maths       50             63
                  Priya   English     62             80
                  Kranti    Maths       92             92
                  William   German    87             75')
# View data
dat
#      Name Subject Semester1 Semester2
# 1   Samma   Maths        96        94
# 2   Sandy English        76        51
# 3  Devesh  German        76        95
# 4  Rakesh   Maths        50        63
# 5   Priya English        62        80
# 6  Kranti   Maths        92        92
# 7 William  German        87        75The proc_tranpose() function may be executed without any
parameters. The default usage will tranpose all numeric variables and
construct generic column names for the new columns:
# No parameters
res <- proc_transpose(dat)
# View result
res
#        NAME COL1 COL2 COL3 COL4 COL5 COL6 COL7
# 1 Semester1   96   76   76   50   62   92   87
# 2 Semester2   94   51   95   63   80   92   75If you didn’t want all numeric variables transposed, you could
specify which variable(s) to transpose using the var
parameter. Multiple variables can be passed as a vector:
You may control the output column names using the
prefix, suffix, and name
parameters. The prefix and suffix will be used
to construct the new transposed column names. The name
parameter value will be used as the name of the generic “NAME” column,
which identify the var parameter values:
# With prefix
res <- proc_transpose(dat, name = VarName, prefix = Student)
# View result
res
    VarName Student1 Student2 Student3 Student4 Student5 Student6 Student7
1 Semester1       96       76       76       50       62       92       87
2 Semester2       94       51       95       63       80       92       75Here is the same function call with a suffix:
# With suffix
res <- proc_transpose(dat, name = VarName, prefix = S, suffix = Score)
# View result
res
    VarName S1Score S2Score S3Score S4Score S5Score S6Score S7Score
1 Semester1      96      76      76      50      62      92      87
2 Semester2      94      51      95      63      80      92      75Note that since a variable name in R cannot start with a number, some sort of prefix is required.
If your data contains a column with appropriate labels for the
transposed columns, you can assign it to the id parameter.
The id values will then be used for the new column
names.
# Assign column names from data
res <- proc_transpose(dat, name = VarName, id = Name)
# View result
res
    VarName Samma Sandy Devesh Rakesh Priya Kranti William
1 Semester1    96    76     76     50    62     92      87
2 Semester2    94    51     95     63    80     92      75Using two id parameters tells the function that you want
columns that are combinations of the two variables:
# Two id variables
res <- proc_transpose(dat, id = v(Name, Subject))
res
#        NAME Samma.Maths Sandy.English Devesh.German Rakesh.Maths Priya.English Kranti.Maths William.German
# 1 Semester1          96            76            76           50            62           92             87
# 2 Semester2          94            51            95           63            80           92             75The default delimiter shown above is a dot (“.”). The delimiter may
be changed with the delimiter parameter:
The by parameter tells the function to group by the
by variable before transposing. As you can see below, the
by varible is then retained on the output dataset so you
can identify which rows belong to which group.
# By variable
res <- proc_transpose(dat, by = Name, id = Subject, name = Semester)
# View result
res
#       Name  Semester German English Maths
# 1   Devesh Semester1     76      NA    NA
# 2   Devesh Semester2     95      NA    NA
# 3   Kranti Semester1     NA      NA    92
# 4   Kranti Semester2     NA      NA    92
# 5    Priya Semester1     NA      62    NA
# 6    Priya Semester2     NA      80    NA
# 7   Rakesh Semester1     NA      NA    50
# 8   Rakesh Semester2     NA      NA    63
# 9    Samma Semester1     NA      NA    96
# 10   Samma Semester2     NA      NA    94
# 11   Sandy Semester1     NA      76    NA
# 12   Sandy Semester2     NA      51    NA
# 13 William Semester1     87      NA    NA
# 14 William Semester2     75      NA    NA The by parameter is a valuable feature of
proc_transpose(). The by variable can have a significant
effect on the shape of the output data. Let’s see what happens when we
use a different by variable.
# By variable
res <- proc_transpose(dat, by = Subject, id = Name)
#   Subject      NAME Sandy Priya Devesh William Samma Rakesh Kranti
# 1 English Semester1    76    62     NA      NA    NA     NA     NA
# 2 English Semester2    51    80     NA      NA    NA     NA     NA
# 3  German Semester1    NA    NA     76      87    NA     NA     NA
# 4  German Semester2    NA    NA     95      75    NA     NA     NA
# 5   Maths Semester1    NA    NA     NA      NA    96     50     92
# 6   Maths Semester2    NA    NA     NA      NA    94     63     92Now let’s use two by variables:
# Two by variables
res <- proc_transpose(dat, by = v(Name, Subject))
# View results
res
#       Name Subject      NAME COL1
# 1    Priya English Semester1   62
# 2    Priya English Semester2   80
# 3    Sandy English Semester1   76
# 4    Sandy English Semester2   51
# 5   Devesh  German Semester1   76
# 6   Devesh  German Semester2   95
# 7  William  German Semester1   87
# 8  William  German Semester2   75
# 9   Kranti   Maths Semester1   92
# 10  Kranti   Maths Semester2   92
# 11  Rakesh   Maths Semester1   50
# 12  Rakesh   Maths Semester2   63
# 13   Samma   Maths Semester1   96
# 14   Samma   Maths Semester2   94By transposing one more time on the results of the previous example, you can nearly restore the original data frame.
# Restore original data shape
res2 <- proc_transpose(res, by = v(Name, Subject), id = NAME)
# View results
res2[ , c("Name", "Subject", "Semester1", "Semester2")]
#      Name Subject Semester1 Semester2
# 1   Priya English        62        80
# 2   Sandy English        76        51
# 3  Devesh  German        76        95
# 4 William  German        87        75
# 5  Kranti   Maths        92        92
# 6  Rakesh   Maths        50        63
# 7   Samma   Maths        96        94The proc_sort() function in the procs
package attempts to recreate the most basic functionality of PROC SORT.
The function accepts an input data frame and returns the sorted result.
Before examining the function, first let’s create some sample data:
# Create sample data
dat <- read.table(header = TRUE, text = '
                    ID  Name    Score
                    1   David    74
                    2   Sam    45
                    3   Bane     87
                    3   Mary     92
                    4   Dane     23
                    5   Jenny    87
                    6   Simran 63
                    8   Priya    72 
                    1   David    45
                    2   Ram    54
                    3   Bane     87
                    5   Ken    87')
# View data
dat
#    ID   Name Score
# 1   1  David    74
# 2   2    Sam    45
# 3   3   Bane    87
# 4   3   Mary    92
# 5   4   Dane    23
# 6   5  Jenny    87
# 7   6 Simran    63
# 8   8  Priya    72
# 9   1  David    45
# 10  2    Ram    54
# 11  3   Bane    87
# 12  5    Ken    87Like proc_transpose(), the function has a default usage,
which is to sort by all variables:
# Default sort
res <- proc_sort(dat)
# View results
res
#    ID   Name Score
# 9   1  David    45
# 1   1  David    74
# 10  2    Ram    54
# 2   2    Sam    45
# 3   3   Bane    87
# 11  3   Bane    87
# 4   3   Mary    92
# 5   4   Dane    23
# 6   5  Jenny    87
# 12  5    Ken    87
# 7   6 Simran    63
# 8   8  Priya    72To sort by a specific variable, you can use the by
parameter:
# Sort By
res <- proc_sort(dat, by = Score)
# View results
res
#    ID   Name Score
# 5   4   Dane    23
# 2   2    Sam    45
# 9   1  David    45
# 10  2    Ram    54
# 7   6 Simran    63
# 8   8  Priya    72
# 1   1  David    74
# 3   3   Bane    87
# 6   5  Jenny    87
# 11  3   Bane    87
# 12  5    Ken    87
# 4   3   Mary    92You can also sort by two variables:
# Sort By
res <- proc_sort(dat, by = v(Score, Name))
# View results
res
#    ID   Name Score
# 5   4   Dane    23
# 9   1  David    45
# 2   2    Sam    45
# 10  2    Ram    54
# 7   6 Simran    63
# 8   8  Priya    72
# 1   1  David    74
# 3   3   Bane    87
# 11  3   Bane    87
# 6   5  Jenny    87
# 12  5    Ken    87
# 4   3   Mary    92Notice that when the scores are tied, the data is now sorted alphabetically by name.
To put the highest scores on top, you can sort descending using the
order parameter.
# Sort By
res <- proc_sort(dat, by = Score, order = descending)
# View results
res
#    ID   Name Score
# 4   3   Mary    92
# 3   3   Bane    87
# 6   5  Jenny    87
# 11  3   Bane    87
# 12  5    Ken    87
# 1   1  David    74
# 8   8  Priya    72
# 7   6 Simran    63
# 10  2    Ram    54
# 2   2    Sam    45
# 9   1  David    45
# 5   4   Dane    23The order instructions can be abbreviated “a” for
ascending and “d” for descending. These abbreviations are convenient
when there is more than one by variable.
The keep parameter allows you to select only some of the
variables for the output dataset:
Another convenient feature of proc_sort() is the
nodupkey option. This option will eliminate duplicates
based on the unique combination of values of the by
variables. For instance, to get a unique list of students, you could use
keep and options = nodupkey:
# Keep and Nodupkey
res <- proc_sort(dat, by = Name, keep = Name, options = nodupkey)
# View results
res
#      Name
# 3    Bane
# 5    Dane
# 1   David
# 6   Jenny
# 12    Ken
# 4    Mary
# 8   Priya
# 10    Ram
# 2     Sam
# 7  SimranThere is also a dupkey option to retain only records
with duplicate key values. This feature would allow you to easily find
students who took the exam twice, and show the scores for each
attempt.
Most of the time, you will sort alphabetically or numerically. But sometimes you may have character data that you want to sort in a specific order that is not alphabetic. Let’s return to the class data we used above:
# Create input data
dat <- read.table(header = TRUE, text = '
                  Name    Subject   Semester1  Semester2
                  Samma   Maths     96             94
                  Sandy   English     76             51
                  Devesh    German    76             95
                  Rakesh    Maths       50             63
                  Priya   English     62             80
                  Kranti    Maths       92             92
                  William   German    87             75')
# View data
dat
#      Name Subject Semester1 Semester2
# 1   Samma   Maths        96        94
# 2   Sandy English        76        51
# 3  Devesh  German        76        95
# 4  Rakesh   Maths        50        63
# 5   Priya English        62        80
# 6  Kranti   Maths        92        92
# 7 William  German        87        75What if we wanted to sort “Maths” to the top, followed by “English” and “German”? How could that be accomplished?
To perform a sort in a specific, non-alphabetic order, first create a factor on your desired sort column and order the levels by the intended sort. For instance:
# Create factor with ordered levels
dat$Subject <- factor(dat$Subject, levels = c("Maths", "English", "German"))
# View attributes
attributes(dat$Subject)
# $levels
# [1] "Maths"   "English" "German" 
# 
# $class
# [1] "factor"Then use proc_sort() to sort by the factor variable:
# Sort by factor variable
dat2 <- proc_sort(dat, by = Subject)
# View result
dat2
#      Name Subject Semester1 Semester2
# 1   Samma   Maths        96        94
# 4  Rakesh   Maths        50        63
# 6  Kranti   Maths        92        92
# 2   Sandy English        76        51
# 5   Priya English        62        80
# 3  Devesh  German        76        95
# 7 William  German        87        75Note that after the sort operation, the “Subject” variable is still a factor:
If desired, we can turn the variable back into a character using the “as.character” parameter. Following the sort, this parameter will automatically cast any factors on the by parameter to character variables:
# Sort by factor variable
dat2 <- proc_sort(dat, by = Subject, as.character = TRUE)
# Same results
dat2
#      Name Subject Semester1 Semester2
# 1   Samma   Maths        96        94
# 4  Rakesh   Maths        50        63
# 6  Kranti   Maths        92        92
# 2   Sandy English        76        51
# 5   Priya English        62        80
# 3  Devesh  German        76        95
# 7 William  German        87        75
# But now Subject is a character
class(dat2$Subject)
# [1] "character"