Tuesday, March 8, 2016

Avoiding unnecessary memory allocations in R

As a rule, everything I discover in R has already been discussed by Hadley Wickham. In this case, he writes:
The reason why the C++ function is faster is subtle, and relates to memory management. The R version needs to create an intermediate vector the same length as y (x - ys), and allocating memory is an expensive operation. The C++ function avoids this overhead because it uses an intermediate scalar.
In my case, I want to count the number of items in a vector below a certain threshold. R will allocate a new vector for the result of the comparison, and then sum over that vector. It's possible to speed that up about ten-fold by directly counting in C++:


Often this won't be the bottleneck, but may be useful at some point.

9 comments:

Andre Mikulec said...

This does not compile on Windows 7 64bit. I get this error.

C:/Users/. . ./Rcpp/include/Rcpp/internal/wrap.h:525:11: error: invalid conversion from 'long long unsigned int' to 'SEXP' [-fpermissive]
make: *** [file1ed87232837.o] Error 1

Jess Thorvall Aunsbjørn said...

Same problem here, also with win 7 64 bit

Michael Kuhn said...

It's working fine on Linux and Mac. Perhaps you can change the size_t to int?

Andre Mikulec said...

Replacement . . .
Seems to work now. Thanks.
Windows 7 64bit

Windows notepad
CTRL+H ... search and replace "size_t" to be "int"

file11d4b795d07.cpp: In function 'int count_less(Rcpp::NumericVector, Rcpp::NumericVector)':
file11d4b795d07.cpp:10:21: warning: suggest parentheses around comparison in operand of '&' [-Wparentheses]
>
> set.seed(42)
>
> N <- 100000000
> v <- runif(N, 0, 10000)
>
> system.time(sum(v < 5000))
user system elapsed
1.82 0.43 2.35

> system.time(v %count<% 5000)
user system elapsed
0.32 0.00 0.34

Unknown said...

Hi,
Tell me how to save for later use the compiled function?

Michael Kuhn said...

Hi Vlad,

you could probably save the function with saveRDS, but I'm not sure. The better way is to create an R package, see here for a good intro: http://r-pkgs.had.co.nz/ (especially the section "Compiled Code").

Unknown said...

Hi,
Saves, but not read.
#-------------------------
> saveRDS(count_less, "count_less")
> rm(count_less)
> load("count_less")
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘count_less’ has magic number 'X'
Use of save versions prior to 2 is deprecated
> source("C:/RData/count_less")
Error in source("C:/RData/count_less") :
C:/RData/count_less:3:8: unexpected symbol
2:
3: native symbol
^
#--------------------------------
I'll find a way.
Thank you for your response.
Good luck

Michael Kuhn said...

Try "readRDS" for loading the stored function.

Unknown said...

Hi Michael,
Unfortunately, this variant does not work.
I will study.
mercy
> N <- 100000000
> v <- runif(N, 0, 10000)
> system.time(sum(v < 5000))
user system elapsed
1.87 0.21 2.14
> system.time(v %count<% 5000);system.time(count_less(v,5000))
user system elapsed
0.31 0.00 0.31
user system elapsed
0.29 0.00 0.28
> saveRDS(object = count_less, file = "C:/RData/count_les.rc")
> rm(count_less)
> readRDS("C:/RData/count_les.rc")
function (x, y)
.Primitive(".Call")(, x, y)
> fun1 <- readRDS("C:/RData/count_les.rc")
> fun1(v, 5000)
Error in .Primitive(".Call")(, x, y) :
NULL value passed as symbol address