bioCS: Avoiding unnecessary memory allocations in R

Tuesday, March 8, 2016

Avoiding unnecessary memory allocations in R

As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:

The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x - ys), and allocating memory is an expensive operation. The C++ function avoids this overhead because it uses an intermediate scalar.

In my case, I want to count the number of items in a vector below a certain threshold. R will allocate a new vector for the result of the comparison, and then sum over that vector. It's possible to speed that up about ten-fold by directly counting in C++:

Often this won't be the bottleneck, but may be useful at some point.

9 comments:

Andre Mikulec said...: This does not compile on Windows 7 64bit. I get this error.

C:/Users/. . ./Rcpp/include/Rcpp/internal/wrap.h:525:11: error: invalid conversion from 'long long unsigned int' to 'SEXP' [-fpermissive]
make: *** [file1ed87232837.o] Error 1; March 8, 2016 at 9:08 PM
Jess Thorvall Aunsbjørn said...: Same problem here, also with win 7 64 bit; March 9, 2016 at 9:35 AM
Michael Kuhn said...: It's working fine on Linux and Mac. Perhaps you can change the size_t to int?; March 9, 2016 at 11:18 AM
Andre Mikulec said...: Replacement . . .
Seems to work now. Thanks.
Windows 7 64bit

Windows notepad
CTRL+H ... search and replace "size_t" to be "int"

file11d4b795d07.cpp: In function 'int count_less(Rcpp::NumericVector, Rcpp::NumericVector)':
file11d4b795d07.cpp:10:21: warning: suggest parentheses around comparison in operand of '&' [-Wparentheses]
>
> set.seed(42)
>
> N <- 100000000
> v <- runif(N, 0, 10000)
>
> system.time(sum(v < 5000))
user system elapsed
1.82 0.43 2.35

> system.time(v %count<% 5000)
user system elapsed
0.32 0.00 0.34; March 9, 2016 at 7:46 PM
Unknown said...: Hi,
Tell me how to save for later use the compiled function?; March 13, 2016 at 2:49 PM
Michael Kuhn said...: Hi Vlad,

you could probably save the function with saveRDS, but I'm not sure. The better way is to create an R package, see here for a good intro: http://r-pkgs.had.co.nz/ (especially the section "Compiled Code").; March 13, 2016 at 9:47 PM
Unknown said...: Hi,
Saves, but not read.
#-------------------------
> saveRDS(count_less, "count_less")
> rm(count_less)
> load("count_less")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘count_less’ has magic number 'X'
Use of save versions prior to 2 is deprecated
> source("C:/RData/count_less")
Error in source("C:/RData/count_less") :
C:/RData/count_less:3:8: unexpected symbol
2:
3: native symbol
^
#--------------------------------
I'll find a way.
Thank you for your response.
Good luck; March 14, 2016 at 9:15 AM
Michael Kuhn said...: Try "readRDS" for loading the stored function.; March 14, 2016 at 9:16 AM
Unknown said...: Hi Michael,
Unfortunately, this variant does not work.
I will study.
mercy
> N <- 100000000
> v <- runif(N, 0, 10000)
> system.time(sum(v < 5000))
user system elapsed
1.87 0.21 2.14
> system.time(v %count<% 5000);system.time(count_less(v,5000))
user system elapsed
0.31 0.00 0.31
user system elapsed
0.29 0.00 0.28
> saveRDS(object = count_less, file = "C:/RData/count_les.rc")
> rm(count_less)
> readRDS("C:/RData/count_les.rc")
function (x, y)
.Primitive(".Call")(, x, y)
> fun1 <- readRDS("C:/RData/count_les.rc")
> fun1(v, 5000)
Error in .Primitive(".Call")(, x, y) :
NULL value passed as symbol address; March 16, 2016 at 10:06 AM