benchmarking - Why is apply() method slower than a for loop in R?

Question

Welcome To Ask or Share your Answers For Others

benchmarking - Why is apply() method slower than a for loop in R?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

benchmarking - Why is apply() method slower than a for loop in R?

As a matter of best practices, I'm trying to determine if it's better to create a function and apply() it across a matrix, or if it's better to simply loop a matrix through the function. I tried it both ways and was surprised to find apply() is slower. The task is to take a vector and evaluate it as either being positive or negative and then return a vector with 1 if it's positive and -1 if it's negative. The mash() function loops and the squish() function is passed to the apply() function.

million  <- as.matrix(rnorm(100000))

mash <- function(x){
  for(i in 1:NROW(x))
    if(x[i] > 0) {
      x[i] <- 1
    } else {
      x[i] <- -1
    }
    return(x)
}

squish <- function(x){
  if(x >0) {
    return(1)
  } else {
    return(-1)
  }
}


ptm <- proc.time()
loop_million <- mash(million)
proc.time() - ptm


ptm <- proc.time()
apply_million <- apply(million,1, squish)
proc.time() - ptm

loop_million results:

user  system elapsed 
0.468   0.008   0.483

apply_million results:

user  system elapsed 
1.401   0.021   1.423

What is the advantage to using apply() over a for loop if performance is degraded? Is there a flaw in my test? I compared the two resulting objects for a clue and found:

> class(apply_million)
[1] "numeric"
> class(loop_million)
[1] "matrix"

Which only deepens the mystery. The apply() function cannot accept a simple numeric vector and that's why I cast it with as.matrix() in the beginning. But then it returns a numeric. The for loop is fine with a simple numeric vector. And it returns an object of same class as that one passed to it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T23:10:11+0000

The point of the apply (and plyr) family of functions is not speed, but expressiveness. They also tend to prevent bugs because they eliminate the book keeping code needed with loops.

Lately, answers on stackoverflow have over-emphasised speed. Your code will get faster on its own as computers get faster and R-core optimises the internals of R. Your code will never get more elegant or easier to understand on its own.

In this case you can have the best of both worlds: an elegant answer using vectorisation that is also very fast, (million > 0) * 2 - 1.

Categories

benchmarking - Why is apply() method slower than a for loop in R?

benchmarking - Why is apply() method slower than a for loop in R?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags