Vectors, Matrices/Arrays, and Lists

R Data Structures

BB

Overview

Yes, another one. I doubt there are many resources on R that don't at least touch on this subject however briefly as a failure to familiarize oneself with R's (or any language, for that matter) fundamental structures will lead to no end of misery down the road. Although this topic is far from exciting, I strongly recommend taking the time to understand native R object classes and how they relate to their respective methods.

As I noted, there is a lot of material on this subject already, the best of which I consider to be Hadley Wickham's Advanced R.Advanced R Given that this information is freely available online, I plan to avoid rehashing all of it and instead focus on a) demonstrating how you can explore and familiarize yourself with each structure and b) alert you to common mistakes and errors that I've made in the past so that others may avoid them (although hopefully you aren't quite the moron I am).

Before I begin, it is important to familiarize yourself with several helpful functions:

  • ?str - Display internal object structure. Oftentimes, this can be fairly verbose for data frames or complicated lists, but there are a number of options to control output (note: for anyone with adolescent-style humor, be sure to read through all the options...).
  • ?class - Identifies object's class (e.g., vector, list, data frame, etc.). You will use this regularly.
  • ?attributes - View object attributes including class, names (column or row), dimensions (for tables/matrices), and comments (if any).
  • Also of use are ?typeof and/or ?modeNote that the "?" in front of the functions brings up the help page in R to determine the internal storage representation.

Vectors

The vector is the most basic data structure in R as there is no such thing as a scalar; only vector's of length one. Vectors are generally created using the ?c function, although there are other methods.

numeric_vector <- c(1,2,3,4,5)
character_vector <- c('a','b','c','d','e')

class(numeric_vector)
## [1] "numeric"
class(character_vector)
## [1] "character"

R both recognizes series and has a few letter constants built-in (?`Constants`) which can save time.

class(1:5)
## [1] "integer"
class(letters[1:5])
## [1] "character"

See the difference in class for this one? Technically, this is equivalent to class(1L:5L).

Test for an object type using the ?`is` method.See R Documentation for more detail on classes and methods.

is.vector(1:5)
## [1] TRUE
is.vector(letters[1:5])
## [1] TRUE

You can easily add names to vectors...

names(numeric_vector) <- character_vector
numeric_vector
## a b c d e
## 1 2 3 4 5
names(numeric_vector)
## [1] "a" "b" "c" "d" "e"

These can be accessed in a number of different ways as noted earlier.

character_vector <- setNames(character_vector,numeric_vector)
character_vector
##   1   2   3   4   5
## "a" "b" "c" "d" "e"
attributes(character_vector)
## $names
## [1] "1" "2" "3" "4" "5"
# And finally...

structure(5:10, names = LETTERS[5:10])
##  E  F  G  H  I  J
##  5  6  7  8  9 10

Matrices/Arrays

I seldom use standard matrices (of the S3 variety) other than for creating dummy variables or a quick and dirty way to get something to print in column format (e.g., as.matrix(letters[1:9])) so I won't write much here. Additionally, they are covered elsewhere much better than I could ever hope to do. Nonetheless, see below for examples...

matrix(letters[1:9],3,3)
##      [,1] [,2] [,3]
## [1,] "a"  "d"  "g"
## [2,] "b"  "e"  "h"
## [3,] "c"  "f"  "i"
is.matrix(matrix(letters[1:9],3,3))
## [1] TRUE
is.array(matrix(letters[1:9],3,3))
## [1] TRUE

Arrays can be extended to more than two dimensions.

array(letters[1:9],dim=c(3,3,3))
## , , 1
##
##      [,1] [,2] [,3]
## [1,] "a"  "d"  "g"
## [2,] "b"  "e"  "h"
## [3,] "c"  "f"  "i"
##
## , , 2
##
##      [,1] [,2] [,3]
## [1,] "a"  "d"  "g"
## [2,] "b"  "e"  "h"
## [3,] "c"  "f"  "i"
##
## , , 3
##
##      [,1] [,2] [,3]
## [1,] "a"  "d"  "g"
## [2,] "b"  "e"  "h"
## [3,] "c"  "f"  "i"
is.array(array(letters[1:9],dim=c(3,3,3)))
## [1] TRUE
is.matrix(array(letters[1:9],dim=c(3,3,3)))
## [1] FALSE

Lists

Lists are R's most flexible structure and also one if its most frequently used.

ls <- list(letters[21:26])
str(ls)
## List of 1
##  $ : chr [1:6] "u" "v" "w" "x" ...
# List of lists...

ls2 <- c(list(letters[1:5]),list(1:5))
str(ls2)
## List of 2
##  $ : chr [1:5] "a" "b" "c" "d" ...
##  $ : int [1:5] 1 2 3 4 5
#  Nested lists can have names...

names(ls2) <- c('letters','numbers')
str(ls2)
## List of 2
##  $ letters: chr [1:5] "a" "b" "c" "d" ...
##  $ numbers: int [1:5] 1 2 3 4 5
# A list of arrays...

str(c(list(array(letters[1:9],dim=c(3,3,3))),
      list(matrix(letters[1:9],3,3))))
## List of 2
##  $ : chr [1:3, 1:3, 1:3] "a" "b" "c" "d" ...
##  $ : chr [1:3, 1:3] "a" "b" "c" "d" ...

Lists can pretty much hold, well, just about anything and often do, as evidenced above. For example, a 'lm' or 'glm' object is a list [cite] that holds a) atomic vectors, b) other lists, and even c) all the variables in the call as a data frame (discussed here).See also for more information. This is why model objects tend to be rather bloated (see below).

object.size(lm(mpg~.,data=mtcars))
## 45768 bytes
object.size(mtcars)
## 6736 bytes

Indexing Vectors, Matrices/Arrays, and Lists

Vectors

Ah ok, now we get to something a little more interesting, but still quite important, indexing. While incredibly powerful, R's different indexing options can be confusing. As you are probably aware, I've been using vector indices since the first code-block in this page. Specifically, letters, is a character vector of length 26 meaning that letters[1:5] subsets the first five values of the vector. An atomic vector can also be indexed as follows letters[c(1,3)], letters[c(1:3,5)], or even letters[-1],letters[-c(1:5)] and letters[c(-1,-3)]. However, letters[c(1,-3)] is not valid.

# EXAMPLES

letters[1:5]
## [1] "a" "b" "c" "d" "e"
letters[c(1,3)]
## [1] "a" "c"
letters[c(1:3,5)]
## [1] "a" "b" "c" "e"
letters[-1]
##  [1] "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r"
## [18] "s" "t" "u" "v" "w" "x" "y" "z"
letters[-c(1:5)]
##  [1] "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v"
## [18] "w" "x" "y" "z"
letters[c(-1,-3)]
##  [1] "b" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [18] "t" "u" "v" "w" "x" "y" "z"
## No No

letters[c(1,-3)]
## Error in letters[c(1, -3)]: only 0's may be mixed with negative subscripts

Named vectors open up even more possibilities...

names(numeric_vector) <- letters[1:5]
str(numeric_vector)
##  Named num [1:5] 1 2 3 4 5
##  - attr(*, "names")= chr [1:5] "a" "b" "c" "d" ...
numeric_vector['a']
## a
## 1
numeric_vector[c('a','b','c')]
## a b c
## 1 2 3
# Or even..

numeric_vector[letters[1:5]]
## a b c d e
## 1 2 3 4 5

There is also the option to index via logical vectors.

numeric_vector<3
##     a     b     c     d     e
##  TRUE  TRUE FALSE FALSE FALSE
numeric_vector[numeric_vector<3]
## a b
## 1 2
# Or use 'which' ...

which(numeric_vector<3)
## a b
## 1 2

And finally, to also provide a glimpse of R's overall flexibility ...

numeric_vector[names(numeric_vector) %in% letters[1:2]]
## a b
## 1 2

Arrays

Multidimensional arrays, while more complex, can be indexed similarly (note that matrices also follow these rules).

sample_array <- array(1:27,dim=c(3,3,3),
  dimnames = list(letters[1:3],letters[4:6],letters[7:9]))

str(sample_array)
##  int [1:3, 1:3, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ : chr [1:3] "a" "b" "c"
##   ..$ : chr [1:3] "d" "e" "f"
##   ..$ : chr [1:3] "g" "h" "i"
sample_array[,,'g']
##   d e f
## a 1 4 7
## b 2 5 8
## c 3 6 9
sample_array[,'d','g']
## a b c
## 1 2 3
sample_array[,'d','g',drop=FALSE]
## , , g
##
##   d
## a 1
## b 2
## c 3
sample_array['a','d','g']
## [1] 1
sample_array['a','d','g',drop=FALSE]
## , , g
##
##   d
## a 1
# The same holds true for sample_array[,,1], sample_array[,1,1], etc.

Lists

Lists follow vector rules, except have the additional `$` option and quickly become tricky in recursive instances.

foo <- list(list(letters[1:5]),list(1:5),month.abb[1:5])
str(foo)
## List of 3
##  $ :List of 1
##   ..$ : chr [1:5] "a" "b" "c" "d" ...
##  $ :List of 1
##   ..$ : int [1:5] 1 2 3 4 5
##  $ : chr [1:5] "Jan" "Feb" "Mar" "Apr" ...

I'll begin by giving each atomic vector in 'foo' its own gibberish name and then by explicitly making the first vector (letters[1:5]) a list and in turn, giving each element of the list 'bar' it's own respective name

foo <- list(letters[1:5],1:5,month.abb[1:5])
str(foo)
## List of 3
##  $ : chr [1:5] "a" "b" "c" "d" ...
##  $ : int [1:5] 1 2 3 4 5
##  $ : chr [1:5] "Jan" "Feb" "Mar" "Apr" ...

Now we can index in several different ways.

names(foo) <- c('bar','baz','qux')

foo$bar <- as.list(setNames(foo$bar,paste0('bar',1:5)))

Look, I can't possibly cover every way a nested list could be indexed (it's a lot, trust meTo practice, try and access everything in:
lm_list <‐ lm(mpg ~ .,data=mtcars).
), but one important take-away is that the double-bracket (?`[[`) will always return a single element, regardless of what the element is whereas `[` will not (or not unless it's a vector again, e.g. foo$bar$bar1[1]).

foo$bar
## $bar1
## [1] "a"
##
## $bar2
## [1] "b"
##
## $bar3
## [1] "c"
##
## $bar4
## [1] "d"
##
## $bar5
## [1] "e"
foo[1]
## $bar
## $bar$bar1
## [1] "a"
##
## $bar$bar2
## [1] "b"
##
## $bar$bar3
## [1] "c"
##
## $bar$bar4
## [1] "d"
##
## $bar$bar5
## [1] "e"
foo[[1]]
## $bar1
## [1] "a"
##
## $bar2
## [1] "b"
##
## $bar3
## [1] "c"
##
## $bar4
## [1] "d"
##
## $bar5
## [1] "e"
foo$bar$bar1
## [1] "a"
foo[[1]][[1]]
## [1] "a"
# etc.

If possible, the `$` operator also uses partial matching (and auto-complete in Rstudio).

foo$q
## [1] "Jan" "Feb" "Mar" "Apr" "May"

But not always possible, so be explicit.

foo$bar$bar
## NULL

Lastly, I recommend being aware of the ?unlist function and the optional argument recursive=TRUE. I can't even begin to tell you how long I spent trying to extract and combine a large list of nested data frames when I first started using R on a regular basis and what a headache it caused me (see also do.call as well).

Adding and Removing Elements

Elements can be added or removed from both vectors and lists in similar fashion. HoweverSee also ?append for another option for add elements..., named list elements can be assigned via `<−` and the `$` sign (see below).

vec <- c(1:5,6:10)
vec[11] <- 11

# The following won't work for vectors...

vec[11] <- NULL # No No
## Error in vec[11] <- NULL: replacement has length zero
rm(vec[11]) # No No
## Error in rm(vec[11]): ... must contain names or character strings

Objects in the global environment should be removed with `rm`, not `NULL` as the latter will just set the content to NULL.

new_list$k <- 11
new_list$k <- NULL
new_list$k <- 11
new_list['k'] <- 11
new_list['k'] <- NULL
new_list[11] <- NULL
new_list[c('k','l')] <- c(11,12)
new_list <- new_list[1:10]
new_list[11:12] <- NULL


Comments



Name:


E-mail:




Aly Chiman
2019-03-17 01:02:00
Hello there,

My name is Aly and I would like to know if you would have any interest to have your website here at raw-r.org promoted as a resource on our blog alychidesign.com ?

We are in the midst of updating our broken link resources to include current and up to date resources for our readers. Our resource links are manually approved allowing us to mark a link as a do-follow link as well
.
If you may be interested please in being included as a resource on our blog, please let me know.

Thanks,
Aly