Lesson 7 dplyr

80% of the work involved with data analysis involves cleaning and shaping the data until it’s in the state you need. Bracket subsetting is handy, but it can be cumbersome and difficult to read, especially for complicated operations. Enter dplyr!

dplyr is a package for making data manipulation easier. (It does a lot more too, but this is what we’ll focus on).

Unlike the subsetting commands we’ve already worked on, dplyr is designed to be highly expressive, and highly readable. It’s structured around a set of verbs, or grammar of data manipulation. The core functions we’ll talk about are below:

  • select
  • arrange
  • filter
  • group_by
  • mutate
  • summarise/summarize