plyr and dplyr for R

plyr is a set of tools for a common set of problems: you need to split up a big data structure into homogeneous pieces, apply a function to each piece and then combine all the results back together.

dplyr is the next iteration of plyr, focussed on tools for working with data frames (hence the d in the name). It has three main goals:

  • Identify the most important data manipulation tools needed for data analysis
    and make them easy to use from R.
  • Provide blazing fast performance for in-memory data by writing key pieces in
    C++.
  • Use the same interface to work with data no matter where it’s stored, whether
    in a data frame, a data table or database.

These two are a couple of the other mainstream manipulation tools outside of
base R.

Leave a Reply

Your email address will not be published. Required fields are marked *