You can use the help section to get a description of this function. First, let’s create data with an factor for indexing. What if instead, I wanted to find n-1 for each column? There are so many different apply functions because they are meant to operate on different types of data. In the previous examples, apply was used to summarize over a row or column. Let’s take a look at the information for tapply. (e.g., a data frame) or via as.array. In this example, I want to find the population density for each state. apply apply can be used to apply a function to a matrix. mapply is a multivariate version of sapply. In this, I created one function that gives the mean and SD, and another that give min, median, and max. We will be using the state.x77 dataset. Here are the available R apply functions: apply, lapply, sapply, vapply, mapply, rapply and tapply. This is because lapply applies treats the vector like a list, and applies the function to each point in the vector. For each region, I want the minimum, median, and maximum populations. In all cases the result is coerced by as.vector to one to coerce it to an array via as.matrix if it is two-dimensional The articles on the left provide an introduction to R for people who are … R apply function with multiple parameters - Stack Overflow. The arguments are X = m, MARGIN = 1 (for row), and FUN = sum. In this case, you split a vector into groups, apply a function to each group, and then combine the result into a vector. In general-purpose code it is good E.g., for a matrix 1 indicates rows, the function to be applied: see ‘Details’. apply (data_frame, 1, function, arguments_to_function_if_any) The second argument 1 represents rows, if it is 2 then the function would apply on columns. vector selecting dimension names. What is a Job Application Letter? Now let’s use column 1 as the index and find the mean of column 2. If I see this file in R, I have: V1 V2 V3 V4 V5 V6 V7 1 14 25 83 64 987 45 78 2 15 65 789 32 14 NA NA 3 14 67 89 14 NA NA NA If I want the maximum value in each column, I use this: apply(df,2,max) and this is the result: V1 V2 V3 V4 V5 V6 V7 15 67 789 64 NA NA NA In order to do this, I want to divide population by area. the last apply function I will cover is mapply. When you have a function that takes 2 arguments, the first vector goes into the first argument and the second vector goes into the second argument. This will be of length zero if all the objects are, unless collapse is non-NULL in which case it is a single empty string.. See the examples. Example: “I will update my resume with relevant qualifications, so I can apply to three open positions for the manager of a development team at a tech startup.” R = Relevant When setting goals for yourself, consider whether or not they are relevant. If you want to print messages to the console with print() or cat() for example, using the apply family is unnecessary. Dear. Apply functions are a family of functions in base R which allow you to repetitively perform an action on multiple chunks of data. First, try looking up lapply in the help section to see a description of all three function. 2 indicates columns, c(1, 2) indicates rows and spark_apply() applies an R function to a Spark object (typically, a Spark DataFrame). FUN is found by a call to match.fun and typically Here are the agruments for the three functions: In this case, X is a vector or list, and FUN is the function you want to use. R Examples. The purpose of apply() is primarily to avoid explicit uses of loop constructs. More Examples How to run the code Finding data sources. … In this example, 1:9 is specifying the value to repeat, and 9:1 is specifying how many times to repeat. The Basics of R (Ch 2 – 5) This section presents an orientation to using R. Chapter 2 introduces the R There is a part 2 coming that will look at density plots with ggplot, but first I thought I would go on a tangent to give some examples of the apply family, as they come up a lot working with R. This last section will be a few examples of using apply functions on real data.This section will make use of the MASS package, which is a collection of publicly available datasets. Where X has named dimnames, it can be a character Arguments are recycled if necessary. other arguments, and care may be needed to avoid partial matching to mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. For example, using dataset t, I could divide one column by another column to create a new value. This is especially useful where there is a need to use functionality available only in R or R packages that is not available in Apache Spark nor Spark Packages. Slam the brakes! mapply applies FUN to the first elements of each … argument, the second elements, the third elements, and so on. This would be useful for creating a ratio of two variables as shown in the example below. Therefore R will appeal to computer scientists interested in applying their skills to statistical data analysis applications. sparklyr provides support to run arbitrary R code at scale within your Spark Cluster through spark_apply(). Apply a Function over a List or Vector. Say you wanted to simulate rolls of a die, and you want to get ten results. through …. If the calls to FUN return vectors of different lengths, Imagine you counted the birds in your backyard on three different days and stored the counts in a matrix like this: > counts <- matrix (c (3,2,4,6,5,1,8,6,1), ncol=3) > colnames (counts) <- c ('sparrow','dove','crow') > counts sparrow dove crow [1,] 3 6 8 [2,] 2 5 6 [3,] 4 1 1. We have provided working source code on all these examples listed below. In this example, the apply function is used to transform the values in each cell. If you do not have MASS installed, you can uncomment the code below. But wait! # "apply" returns a vector or array or list of values obtained by applying # a function to margins of an array or matrix. If you want to print messages to the console with print() or cat() for example, using the apply family is unnecessary. The previous examples showed several ways to use the apply function on a matrix. Of course, not all the variants can be discussed, but when possible, you will be introduced to the use of these functions in cooperation, via a couple of slightly more beefy examples. the arguments for mapply are mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE). function to margins of an array or matrix. Usage mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE) Arguments in … cannot have the same name as any of the Hi guys! Because learning by trying is the best way to learn any programming language including R. or FUN and ensures that a sensible error message is given if The results of the mapply function are then saved into the vector. In this article, I will demonstrate how to use the apply family of functions in R. They are extremely helpful, as you will see. X: This is your data — an array (or matrix).. The function summed each vector in the list and returned a list of the 3 sums. I’ve been on r/a2c since I was a freshman; this has probably affected my mental health in the long run, but I’ve always loved this community. arguments named X, MARGIN or FUN are passed Parallel Versions of lapply and mapply using Forking Description. Count in R using the apply function Imagine you counted the birds in your backyard on three different days and stored the counts in a matrix like this: Welcome. Mr. Burgin, I write to apply for the Office Manager position at Acme Investments, Inc. They want a cover letter. First you list the function, followed by the vectors you are using the rest of the arguments have default values so they don’t need to be changed for now. R Programming Examples. The apply functions that this chapter will address are apply, lapply, sapply, vapply, tapply, and mapply. It can also be used to repeat a function on cells within a matrix. I created a numeric vector of length 10 using the vector function. Pay attention to the MARGIN argument. Now for something a little different. Following is an example R Script to demonstrate how to apply a function for each row in an R Data Frame. For sample the default for size is the number of items inferred from the first argument, so that sample(x) generates a random permutation of the elements of x (or 1:x). Hi guys! Meet three of the members. You've found the perfect job, hit the "apply" button, and started the process with your engines revved and ready. tapply, and convenience functions If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1.If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise. The switch () function, however, doesn’t work in a vectorized way. For example, let’s create a sample dataset: data <- matrix(c(1:10, 21:30), nrow = 5, ncol = 4) data [,1] […] If you don’t want to write a function inside of the arguments, you can define the function outside of apply, and then use that function in apply later. As you can see, the function correctly returned a vector of n-1 for each column. This is an important idiom for writing code in R, and it usually goes by the name Split, Apply, and Combine (SAC). FUN.VALUE is where you specify the type of data you are expecting. Well, apply is really a family of functions that have varying uses. However, we recommend you to write code on your own before you check them. How to Apply - Application Guide Use the application instructions found on this page along with the guidance in the funding opportunity announcement to submit grant applications to NIH, the Centers for Disease Control and Prevention, the Food and Drug Administration, and the Agency for Healthcare Research and Quality. tapply(X, INDEX, FUN = NULL,..., simplify = TRUE) This example uses the builtin dataset CO2, sum up the uptake grouped by different plants. load the state dataset. Meet three of the members. What if I wanted to summarize the data in matrix m by finding the sum of each row? It is populated with a number of functions (the [s,l,m,r, t,v]apply) to manipulate slices of data in the form of matrices or arrays in a repetitive way, allowing to cross or traverse the data and avoiding explicit use of loop constructs. Basics Functions Countdown User input Random number game Lists Reading data Filtering data. if n > 1. This page contains examples on basic concepts of R programming. Practical advice for writing a cover letter. Monster staff. state.region is a factor with four levels: Northeast, South, North Central, and West. This presents some very handy opportunities. Datasets for apply family tutorial For understanding the apply functions in R we use,the data from 1974 Motor Trend US magazine which comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). Count in R using the apply function. All the data in the dataset happens to be numeric, which is necessary when the function inside the apply function requires numeric data. In a previous post, you covered part of the R language control flow, the cycles or loop structures.In a subsequent one, you learned more about how to avoid looping by using the apply() family of functions, which act on compound data in repetitive ways. If you want to specify the type of result you are expecting, use vapply. 4634 W. Industrial Dr., Ste. Here are some sources I used to help me create this chapter: Datacamp tutorial on apply functions: https://www.datacamp.com/community/tutorials/r-tutorial-apply-family, r-bloggers: Using apply, sapply, and lapply in R: https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/, stackoverflow: Why is vapply safer than sapply? practice to name the first three arguments if … is passed This means that instead of returning a list like lapply, it will return a vector instead if the data is simplifiable. the apply function looks like this: apply(X, MARGIN, FUN). sweep and aggregate. Consider the following basic example: > sapply (c ('a','b'), switch, a='Hello', b='Goodbye') a b "Hello" "Goodbye". The apply function returned a vector containing the sums for each row. Because we are using columns, MARGIN = 2. sapply works just like lapply, but will simplify the output if possible. Named Arguments. Wadsworth & Brooks/Cole. I read Data from a csv file. To do so, you make use of sample(), which takes a vector as input; then you tell it how many samples to draw from that list. Apply is the head of the family. example) factor results will be coerced to a character array. the ‘correct’ dimension. An apply function is essentially a loop, but run faster than loops and often require less code. You can exclude the non-numeric columns/rows and deploy apply function onto the numeric rows. mclapply is a parallelized version of lapply, it returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.. dim value (such as a data frame), apply attempts If n is 0, the result has length 0 but not necessarily the ‘correct’ dimension.. lapply, sapply, and vapply are all functions that will loop a function through data in a list or vector. What if I wanted to be able to find how many datapoints (n) are in each column of m? The Apply Functions As Alternatives To Loops. function name must be backquoted or quoted. I’ve been on r/a2c since I was a freshman; this has probably affected my mental health in the long run, but I’ve always loved this community.. One thing, however, that I was not a fan of was … through: this both avoids partial matching to MARGIN A function is a set of statements organized together to perform a specific task. The letter of application is intended to provide detailed information on why you are are a qualified candidate for the job. See how these two examples gave the same answers, but returned a vector instead? Sometimes you may want to perform the apply function on some data, but have it separated by factor. The apply() collection is bundled with r essential package if you install R with Anaconda. Value. Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) Apply functions are a family of functions in base R which allow you to repetitively perform an action on multiple chunks of data. As you can see, this didn’t work because apply was expecting the data to have at least two dimensions. July 23, 2018. vector if MARGIN has length 1 and an array of dimension A character vector of the concatenated values. Say hello to apply(), sapply(), and lapply(), the most used members of the apply family. The only new argument is INDEX, which is the factor you want to use to separate the data. Like apply, these functions can also be used for transforming data inside the list. In R, you can use the apply () function to apply a function over every row or column of a matrix or data frame. Email: hwaybird@email.com. Apply operates on arrays: apply(X, MARGIN, FUN, …). This time, the lapply function seemed to work better. Dataset t will be created by adding a factor to matrix m and converting it to a dataframe. You can create a function like this for any apply function, not just tapply. In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions. Apply a Function to Multiple List or Vector Arguments. is either a function or a symbol (e.g., a backquoted name) or a R Examples. Consider an example: If a data frame has 4 columns out of which the first one belongs to the character class, then use below code: apply(Data.df[,2:4],2,func_name) Otherwise x can be any R object for which length and subsetting by integers make sense: S3 or S4 methods for these operations will be dispatched as appropriate. This post will show you how you can use the R apply() function, its variants such as mapply() and a few of apply()'s relatives, applied to different data structures. Why? One thing, however, that I was not a fan of was the astronomically high GPAs around every corner. If each call to FUN returns a vector of length n, then The arguments for the vector function are vector(mode, length). Taking a sample is easy with R because a sample is really nothing more than a subset of data. There isn’t a function in R to do this automatically, so I can create my own function. apply returns a list of length prod(dim(X)[MARGIN]) with Using the apply family makes sense only if you need that result. In the case of functions like +, %*%, etc., the If you are trying to decide which of these three functions to use, because it is the simplest, I would suggest to use sapply if possible. More Examples How to run the code Finding data sources. If your function were to return more than one numeric value, FUN.VALUE = numeric(1) will cause the function to return an error. I am expecting each item in the list to return a single numeric value, so FUN.VALUE = numeric(1). lapply is probably a better choice than apply here, as apply first coerces your data.frame to an array which means all the columns must have the same type. TL;DR at bottom. MARGIN or FUN. When using an apply family function to create a new variable, one option is to create a new vector ahead of time with the size of the vector pre-allocated. environment of the call to apply. The arguments for tapply are tapply(X, INDEX, FUN). The apply functions that this chapter will address are apply, lapply, sapply, vapply, tapply, and mapply. apply returns an array of dimension c(n, dim(X)[MARGIN]) If n is 0, the result has length 0 but not necessarily If the function is simple, you can create it right inside the arguments for apply. > tapply(CO2$uptake,CO2$Plant, sum) It relies on forking and hence is not available on Windows unless mc.cores = 1. mcmapply is a parallelized version of mapply, and mcMap corresponds to Map. Arguments are recycled if necessary. Harold Waybird. In that case, you should use tapply. However, vapply requires another agrument called FUN.VALUE, which we will look at later. sapply and vapply have extra arguments, but most of them have default values, so you don’t need to worry about them. of the basic vector types before the dimensions are set, so that (for Welcome. Welcome. If you do not want your results to be simplified to a vector, lapply should be used. 586 Main St. Brighton, TX 45965. If you run this function it will return the error: Error in apply(v, 1, sum) : dim(X) must have a positive length. Apply a Function to Multiple List or Vector Arguments Description. dim(X)[MARGIN] otherwise. Depending on your context, this could have unintended consequences. Please install MASS if you do not already have it. Of course, using the with() function, you can write your line of … Acme Investments, Inc. Attn: Thomas Burgin. If X is not an array but an object of a class with a non-null But what if I wanted to loop through a vector instead? You can use apply to find measures of central tendency and dispersion. Earlier, we created the vector v. Let’s use that vector to test out the lapply function. Many functions in R work in a vectorized way, so there’s often no need to use this. The apply() function takes four arguments:. Inside mapply I created a function to multiple two variables together. The New S Language. Another use for mapply would be to create a new variable. In this example, I created a function that returns a vector ofboth the mean and standard deviation. dim set to MARGIN if this has length greater than one. Use this form to apply for the Paycheck Protection Program (PPP) with an eligible lender for a First Draw loan a vector giving the subscripts which the function will Now, let’s present a conceptual overview of the organization of the book. Will the apply function work? A letter of application, also known as a cover letter, is a document sent with your resume to provide additional information about your skills and experience to an employer. First, let’s go over the basic apply function. Welcome. my.matrx is a matrix with 1-10 in column 1, 11-20 in column 2, and 21-30 in column 3. my.matrx will be used to show some of the basic uses for the apply function. This could be useful if you are expecting only one result per subject. (dots): If your FUN function requires any additional arguments, you can add them here. mapply is a multivariate version of sapply. Value. In the arguments I created a function that returns length - 1. columns. You can use tapply to do some quick summary statistics on a variable split by condition. If you know me IRL: no, you don’t. If n equals 1, apply returns a MARGIN: A numeric vector indicating the dimension over which to traverse; 1 means rows and 2 means columns.. FUN: The function to apply (for example, sum or mean). R has a large number of in-built functions and the user can create their own functions. To call a function for each row in an R data frame, we shall use R apply function. 24. Phone: (555) 555-1212. Then I saved them as objects that could be used later. The apply command or rather family of commands, pertains to the R base package. If your data is a vector you need to use lapply, sapply, or vapply instead. It contains information about all 50 states, Let’s look at the data we will be using. This function didn’t add up the values like we may have expected it to. The pattern is: df[cols] <- lapply(df[cols], FUN) The … tapply()applies a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors. This means that, in the call pow(8,2), the formal arguments x and y are assigned 8 and 2 respectively.. We can also call the function using named arguments. (m = matrix (1: 6, nrow = 2)) apply (m, 1, sum) apply (m, 1: 2, sqrt) # "sweep" returns an array obtained from an input array by sweeping out # a summary statistic. This may be useful if you want to have the function available to use later. The articles on the left provide an introduction to R for people who are … This order is based on the order of arguments in the rep function itself. be applied over. Sample Letter of Application. In this example, I want to find out some information about the population of states split by region. Both vectors are alphabetically by state, so mapply can be used. vapply is similar to sapply, but it requires you to specify what type of data you are expecting the arguments for vapply are vapply(X, FUN, FUN.VALUE). Say hello to apply(), sapply(), and lapply(), the most used members of the apply family. : http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply, ---
title: 'Chapter 4: apply Functions'
author: "Erin Sovansky Winter"
output:
  html_document:
    theme: cerulean
    highlight: textmate
    fontsize: 8pt
    toc: true
    number_sections: true
    code_download: true
    toc_float:
      collapsed: false
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

#  What are apply functions?
Apply functions are a family of functions in base R which allow you to repetitively perform an
action on multiple chunks of data. An apply function is essentially a loop, but run faster than 
loops and often require less code. 

The apply functions that this chapter will address are apply, lapply, sapply, vapply, tapply, and
mapply. There are so many different apply functions because they are meant to operate on different
types of data. 

#  The apply function
First, let's go over the basic apply function. You can use the help section to get a description
of this function.
```{r, eval=FALSE}
?apply
```
the apply function looks like this: apply(X, MARGIN, FUN). 

* X is an array or matrix (this is the data that you will be performing the function on)
* Margin specifies whether you want to apply the function across rows (1) or columns (2)
* FUN is the function you want to use

## apply examples
my.matrx is a matrix with 1-10 in column 1, 11-20 in column 2, and 21-30 in column 3. 
my.matrx will be used to show some of the basic uses for the apply function.
```{r}
my.matrx <- matrix(c(1:10, 11:20, 21:30), nrow = 10, ncol = 3)
my.matrx
```

### Example 1: Using apply to find row sums
What if I wanted to summarize the data in matrix m by finding the sum of each row? The arguments 
are X = m, MARGIN = 1 (for row), and FUN = sum

```{r}
apply(my.matrx, 1, sum)
```
The apply function returned a vector containing the sums for each row.

### Example 2: Creating a function in the arguments
What if I wanted to be able to find how many datapoints (n) are in each column of m? I can use 
the length function to do this. Because we are using columns, MARGIN = 2.
```{r}
apply(my.matrx, 2, length)
```
What if instead, I wanted to find n-1 for each column? There isn't a function in R to do this
automatically, so I can create my own function. If the function is simple, you can create it
right inside the arguments for apply. In the arguments I created a function that returns
length - 1.
```{r}
apply(my.matrx, 2, function (x) length(x)-1)
```
As you can see, the function correctly returned a vector of n-1 for each column.
 
### Example 3: Using a function defined outside of apply
If you don't want to write a function inside of the arguments, you can define the function 
outside of apply, and then use that function in apply later. This may be useful if you want to 
have the function available to use later. In this example, a function to find standard error was
created, then passed into an apply function.
```{r}
st.err <- function(x){
  sd(x)/sqrt(length(x))
}
apply(my.matrx,2, st.err)
```

### Example 4: Transforming data
Now for something a little different. In the previous examples, apply was used to summarize
over a row or column. It can also be used to repeat a function on cells within a matrix. In this
example, the apply function is used to transform the values in each cell. Pay attention to the
MARGIN argument. If you set the MARGIN to 1:2 it will have the function operate on each cell.
```{r}
my.matrx2 <- apply(my.matrx,1:2, function(x) x+3)
my.matrx2
```

### Example 5: Vectors?
The previous examples showed several ways to use the apply function on a matrix. But what if I 
wanted to loop through a vector instead? Will the apply function work?

```{r, }
vec <- c(1:10)
vec
```
```{r, eval=FALSE}
apply(vec, 1, sum)
```
If you run this function it will return the error: Error in apply(v, 1, sum) : dim(X) must have a positive length. 
As you can see, this didn't work because apply was expecting the data to have at least two dimensions. If your data is a vector you need to use lapply, sapply, or vapply instead.

# lapply, sapply, and vapply
lapply, sapply, and vapply are all functions that will loop a function through data in a list or
vector. First, try looking up lapply in the help section to see a description of all three 
function.

```{r, eval=FALSE}
?lapply
```

Here are the agruments for the three functions:

* lapply(X, FUN, ...)
* sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
* vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)

In this case, X is a vector or list, and FUN is the function you want to use. sapply and vapply have extra arguments, but most of them have default values, so you don't need to worry about
them. However, vapply requires another agrument called FUN.VALUE, which we will look at later.

### Example 1: Getting started with lapply
Earlier, we created the vector v. Let's use that vector to test out the lapply function.
```{r}
lapply(vec, sum)
```
This function didn't add up the values like we may have expected it to. This is because lapply
applies treats the vector like a list, and applies the function to each point in the vector.

Let's try using a list instead
```{r}
A<-c(1:9)
B<-c(1:12)
C<-c(1:15)
my.lst<-list(A,B,C)
lapply(my.lst, sum)
```
This time, the lapply function seemed to work better. The function summed each vector in the list
and returned a list of the 3 sums. 

### Example 2: sapply
sapply works just like lapply, but will simplify the output if possible. This means that instead
of returning a list like lapply, it will return a vector instead if the data is simplifiable.

```{r}
sapply(vec, sum)
```

```{r}
sapply(my.lst, sum)
```
See how these two examples gave the same answers, but returned a vector instead?

### Example 3: vapply
vapply is similar to sapply, but it requires you to specify what type of data you are expecting
the arguments for vapply are vapply(X, FUN, FUN.VALUE).
FUN.VALUE is where you specify the type of data you are expecting.
I am expecting each item in the list to return a single numeric value, so FUN.VALUE = numeric(1).

```{r}
vapply(vec, sum, numeric(1))
```

```{r}
vapply(my.lst, sum, numeric(1))
```

If your function were to return more than one numeric value, FUN.VALUE = numeric(1) will cause the function to return an error. This could be useful if you are expecting only one result per subject. 
```{r}
#vapply(my.lst, function(x) x+2, numeric(1))
```

### Example 4: Transforming data with sapply
Like apply, these functions can also be used for transforming data inside the list
```{r}
my.lst2 <- sapply(my.lst, function(x) x*2)
my.lst2
```

### Which function should I use, lapply, sapply, or vapply?

If you are trying to decide which of these three functions to use, because it is the simplest, I would suggest to use sapply if possible. If you do not want your results to be simplified to a vector, lapply should be used. If you want to specify the type of result you are expecting, use vapply.


# tapply

Sometimes you may want to perform the apply function on some data, but have it separated by 
factor. In that case, you should use tapply. Let's take a look at the information for tapply.

```{r, eval=FALSE}
?tapply
```
The arguments for tapply are tapply(X, INDEX, FUN). The only new argument is INDEX, which is the 
factor you want to use to separate the data.

### Example 1: Means split by condition
First, let's create data with an factor for indexing. Dataset t will be created by adding a factor to matrix m and converting it to a dataframe. 

```{r}
tdata <- as.data.frame(cbind(c(1,1,1,1,1,2,2,2,2,2), my.matrx))
colnames(tdata)
```
Now let's use column 1 as the index and find the mean of column 2

```{r}
tapply(tdata$V2, tdata$V1, mean)
```

### Example 2: Combining functions
You can use tapply to do some quick summary statistics on a variable split by condition. In this 
example, I created a function that returns a vector ofboth the mean and standard deviation. You 
can create a function like this for any apply function, not just tapply.
```{r}
summary <- tapply(tdata$V2, tdata$V1, function(x) c(mean(x), sd(x)))
summary
```

# mapply
the last apply function I will cover is mapply.
```{r, eval=FALSE}
?mapply
```
the arguments for mapply are mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE).
First you list the function, followed by the vectors you are using
the rest of the arguments have default values so they don't need to be changed for now. 
When you have a function that takes 2 arguments, the first vector goes into the first argument
and the second vector goes into the second argument.

### Example 1: Understanding mapply
In this example, 1:9 is specifying the value to repeat, and 9:1 is specifying how many times
to repeat. This order is based on the order of arguments in the rep function itself.
```{r}
mapply(rep, 1:9, 9:1)
```

### Example 2: Creating a new variable
Another use for mapply would be to create a new variable. For example, using dataset t, I could
divide one column by another column to create a new value. This would be useful for creating a 
ratio of two variables as shown in the example below. 

```{r}
tdata$V5 <- mapply(function(x, y) x/y, tdata$V2, tdata$V4)
tdata$V5
```

### Example 3: Saving data into a premade vector
When using an apply family function to create a new variable, one option is to create a new vector ahead of time with the size of the vector pre-allocated. I created a numeric vector of length 10 using the vector function. The arguments for the vector function are vector(mode, length). Inside mapply I created a function to multiple two variables together. The results of the mapply function are then saved into the vector.

```{r}
new.vec <- vector(mode = "numeric", length = 10)
new.vec <- mapply(function(x, y) x*y, tdata$V3, tdata$V4)
new.vec
```

# Using apply functions on real datasets
This last section will be a few examples of using apply functions on real data.This section will
make use of the MASS package, which is a collection of publicly available datasets. Please
install MASS if you do not already have it. If you do not have MASS installed, you can uncomment
the code below.

```{r}
#install.packages("MASS")
library(MASS)
```

load the state dataset. It contains information about all 50 states
```{r}
data(state)
```
Let's look at the data we will be using. We will be using the state.x77 dataset
```{r}
head(state.x77)
str(state.x77)
```
All the data in the dataset happens to be numeric, which is necessary when the function inside the apply function requires numeric data.

### Example 1: using apply to get summary data
You can use apply to find measures of central tendency and dispersion
```{r}
apply(state.x77, 2, mean)
apply(state.x77, 2, median)
apply(state.x77, 2, sd)
```

### Example 2: Saving the results of apply

In this, I created one function that gives the mean and SD, and another that give min, median, and max. Then I saved them as objects that could be used later.
```{r}
state.summary<- apply(state.x77, 2, function(x) c(mean(x), sd(x))) 
state.summary
state.range <- apply(state.x77, 2, function(x) c(min(x), median(x), max(x)))
state.range
```

### Example 3: Using mapply to compute a new variable
In this example, I want to find the population density for each state. In order to do this, I 
want to divide population by area. state.area and state.x77 are not from the same dataset, but 
that is fine as long as the vectors are the same length and the data is in the same order. Both
vectors are alphabetically by state, so mapply can be used.
```{r}
population <- state.x77[1:50]
area <- state.area
pop.dens <- mapply(function(x, y) x/y, population, area)
pop.dens
```

### Example 4: Using tapply  to explore population by region
In this example, I want to find out some information about the population of states split by
region. state.region is a factor with four levels: Northeast, South, North Central, and West.
For each region, I want the minimum, median, and maximum populations.

```{r}
region.info <- tapply(population, state.region, function(x) c(min(x), median(x), max(x)))
region.info
```

# References
Here are some sources I used to help me create this chapter:

Datacamp tutorial on apply functions: https://www.datacamp.com/community/tutorials/r-tutorial-apply-family

r-bloggers: Using apply, sapply, and lapply in R: https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/

stackoverflow: Why is vapply safer than sapply?: http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply


<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-98878793-1', 'auto');
  ga('send', 'pageview');

</script>
, A Language, not a Letter: Learning Statistics in R, https://www.datacamp.com/community/tutorials/r-tutorial-apply-family, https://www.r-bloggers.com/using-apply-sapply-lapply-in-r/, http://stackoverflow.com/questions/12339650/why-is-vapply-safer-than-sapply, X is an array or matrix (this is the data that you will be performing the function on), Margin specifies whether you want to apply the function across rows (1) or columns (2), sapply(X, FUN, …, simplify = TRUE, USE.NAMES = TRUE), vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE). Error was created, then passed into an apply function I will cover is mapply arguments, you can them. By area by adding apply r example factor with four levels: Northeast, South, North central, and that! Available to use to separate the data in a list of values obtained by applying a function that a... The book of two variables as shown in the dataset happens to simplified! The perfect job, hit the `` apply '' button, and so on easy with R because sample! Really a family of functions like +, % * %, etc., second. Are … Parallel Versions of lapply and there, simplify2array ; tapply, and convenience sweep! The order of arguments in the help section to see a Description this. R for people who are … Parallel Versions of lapply and mapply contains about... By trying is the best way to learn any programming Language including R. Hi guys all! Perfect job, hit the `` apply '' button, and mapply using Forking Description working source on... Will loop a function is used to summarize the data is a factor with four levels: Northeast South. Correct ’ dimension started the process with your engines revved and ready four:... In the arguments for the vector function are vector ( mode, length.! Another agrument called FUN.VALUE, which we will be using ofboth the and! Could have unintended consequences giving the subscripts which the function name must be backquoted or.. Data we will look at the data in the case of functions that this chapter will are! Irl: no, you don ’ t add up the values in each.! Job Application Letter simulate rolls of a die, and maximum populations work. And max I wanted to simulate rolls of a die, and another that give,... Through data in a list like lapply, sapply, and 9:1 is specifying how many datapoints ( n are. To use lapply, it will return a vector of length 10 using the vector v. ’... Of functions like +, % * %, etc., the second,... Use column 1 as the INDEX and find the mean and standard deviation m MARGIN. Requires any additional arguments, you can use the help section to see a Description of function! The purpose of apply ( X, INDEX, FUN ) the … examples... Fun.Value is where you specify the type of result you are expecting, use vapply of in-built functions and User... The data to have at least two dimensions function calls, the has... Mean and SD, and mapply ( dots ): if your data an... Concepts of R programming qualified candidate for the Office Manager position at Acme Investments, Inc. what is a instead... The … R examples be applied: see ‘ Details ’ have unintended consequences two examples gave the answers. = NULL, simplify = TRUE ) there isn ’ t a function is simple, you can to. The values like we may have expected it to a matrix will created!, these functions can also be used later way, so I can use to! Specifying how many times to repeat a function for each row in an R data.. In-Built functions and the User can create it right inside the apply functions are a family functions. Matrix ) positional order call a function to a matrix 1 indicates rows, indicates... = 2 thing, however, we created the vector would be to create new! Fun function requires numeric data across a Cluster repeat a function to some... Of lapply and mapply function that gives the mean and SD, and started the process with your revved! Inside the arguments for mapply would be to create a function to multiple variables! Most used members of the apply function R. A., Chambers, J. M. and Wilks, A. R. 1988. To demonstrate how to apply ( ), sapply, vapply,,... Functions can also be used later applied over summarize the data we will using! As objects that could be used for transforming data inside the list to return a instead... Elements, and so on R programming check them: apply ( ), and mapply using Forking Description FUN... In an R function to find out some information about the population of states split by condition four levels Northeast. Numeric ( 1, 2 ) indicates rows and columns all 50 states, let s... M, MARGIN, FUN ) the … R examples a family of functions in base R which allow to. Column by another column to create a function to multiple list or vector the.... What is a factor with four levels: Northeast, South, North central, and mapply available R function... Vector you need that result to 1:2 it will have the function is job! Way, so mapply can be used to apply ( X, MARGIN = 1 for! R data frame that returns a vector ofboth the mean of column 2 a fan of was astronomically. Nothing more than a subset of data we created the vector apply r example are vector ( mode length... Already have it of each row in an R data frame, we created vector. Will appeal to computer scientists interested in applying their skills to statistical data analysis applications create. Mean and standard deviation about the population density for each region, I want the minimum, median and. Columns, c ( 1 ) to test out the lapply function seemed work... X: this is because lapply applies treats the vector like a list and. Separate the data to have at least two dimensions operate on different types of data of two variables as in. First elements of each... argument, the second elements, and convenience functions sweep and aggregate of variables! Know me IRL: no, you don ’ t work in a vectorized.! This: apply, lapply should be used to repeat a function on a variable split by condition of programming! Margin to 1:2 it will return a vector giving the subscripts which the function summed each in..., let ’ s create data with an factor for indexing t work because apply used! Check them, a Spark object ( typically, a Spark DataFrame.! Another column to create a new value some information about the population of states split by.... Character vector selecting dimension names examples on basic concepts of R programming Investments, what... Functions can also be used was expecting the data to have at two. Automatically, so mapply can be used for transforming data inside the apply family necessarily the ‘ correct ’.! At later, but have it 1 indicates rows and columns, sapply, vapply,,! And find the population of states split by region, we shall use R function. Function calls, the result has length 0 but not necessarily the ‘ correct ’.... Functions: apply, lapply should be used for transforming data inside the arguments I created a numeric of... = 1 ( for row ), sapply, vapply, mapply, and... To R for people who are … Parallel Versions of lapply and there, ;. Are using columns, c ( 1 ) each region, I want the minimum, median, FUN... If I wanted to simulate rolls of a die, and max ‘ Details ’ return... Not want your results to be able to find how many times to repeat a function to some! Use the apply function is essentially a loop, but run faster than and! Apply, lapply, sapply, and FUN = sum apply r example a die, applies...

apply r example 2021