2.6 Vectors and data types

A vector is the most common and basic data structure in R, and is pretty much the workhorse of R. It’s basically just a list of values, mainly either numbers or characters. They’re special lists that you can do math with. You can assign this list of values to a variable, just like you would for one item. You can add elements to your vector simply by using the c() function, which stands for combine:

one_to_five <- c(1, 2, 3, 4, 5)
one_to_five <- 1:5
one_to_five

A vector can also contain characters:

primary_colors <- c("red", "yellow", "blue")
primary_colors

There are many functions that allow you to inspect the content of a vector. length() tells you how many elements are in a particular vector:

length(one_to_five)
length(primary_colors)

You can also do math with whole vectors. For instance if we wanted to multiply all the values in our vector by a scalar, we can do

5 * one_to_five

or we can add the data in the two vectors together

two_to_ten <- one_to_five + one_to_five
two_to_ten

This is very useful if we have data in different vectors that we want to combine or work with.

There are few ways to figure out what’s going on in a vector.

class() indicates the class (the type of element) of an object:

class(one_to_five)
class(primary_colors)
new_digits <- c(one_to_five, 90) # adding at the end
new_digits <- c(30, new_digits) # adding at the beginning
new_digits

What happens here is that we take the original vector one_to_five, and we are adding another item first to the end of the other ones, and then another item at the beginning. We can do this over and over again to build a vector or a dataset. As we program, this may be useful to autoupdate results that we are collecting or calculating.

We just saw 2 of the data types that R uses: "character" and "integer". The others you will likely encounter during data analysis are:

  • "logical" for TRUE and FALSE (the boolean data type)
  • "numeric" for floating point decimal numbers
  • "factor" for categorical data. Similar to "character" data, but factors have levels

Importantly, a vector can only contain one data type. If you combine multiple data types in a vector with the c() command, R will try to coerce all the values to the same data type. If it cannot, it will throw an error.

For example, what data type is our one_to_five vector if we divide it by 2?

divided_integers <- one_to_five/2
divided_integers

class(divided_integers)

Vectors are indexed sets, which means that every value can be referred to by its order in the vector. R indexes start at 1. Programming languages like Fortran, MATLAB, and R start counting at 1, because that’s what human beings typically do. Languages in the C family (including C++, Java, Perl, and Python) count from 0 because that’s simpler for computers to do.

We can index a vector in many different ways. We can specify a position of a single value, a range of values, or a vector of values. We can even specify which values to remove by their indices.

one_to_five[3]
one_to_five[1:3]
one_to_five[c(1, 3, 5)]
one_to_five[-2]